Importing Dataset¶

In [1]:
# Importing required libraries
import pandas as pd
import numpy as np
import plotly.express as plx
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import sklearn
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
import time
from tqdm import tqdm
In [2]:
# Column names writtened in a list to apply while importing csv file from the dataset location
column_names = ['CIC0','SM1_Dz(Z)','GATS1i','NdsCH','NdssC','MLOGP', 'LC50']
In [3]:
# The data in the csv file is separated with colon(;), so we set the delimiter as ';'
data = pd.read_csv('C:\\Users\\harip\\INEURON_PROJECTS\\Prediction of LC50\\LC50_Project\\data\\dataset(csv)\\qsar_fish_toxicity.csv', header = None, delimiter = ';', names = column_names)
In [4]:
# Looking at the data
data
Out[4]:
CIC0 SM1_Dz(Z) GATS1i NdsCH NdssC MLOGP LC50
0 3.260 0.829 1.676 0 1 1.453 3.770
1 2.189 0.580 0.863 0 0 1.348 3.115
2 2.125 0.638 0.831 0 0 1.348 3.531
3 3.027 0.331 1.472 1 0 1.807 3.510
4 2.094 0.827 0.860 0 0 1.886 5.390
... ... ... ... ... ... ... ...
903 2.801 0.728 2.226 0 2 0.736 3.109
904 3.652 0.872 0.867 2 3 3.983 4.040
905 3.763 0.916 0.878 0 6 2.918 4.818
906 2.831 1.393 1.077 0 1 0.906 5.317
907 4.057 1.032 1.183 1 3 4.754 8.201

908 rows × 7 columns

Dataset Information¶

In [5]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 908 entries, 0 to 907
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   CIC0       908 non-null    float64
 1   SM1_Dz(Z)  908 non-null    float64
 2   GATS1i     908 non-null    float64
 3   NdsCH      908 non-null    int64  
 4   NdssC      908 non-null    int64  
 5   MLOGP      908 non-null    float64
 6   LC50       908 non-null    float64
dtypes: float64(5), int64(2)
memory usage: 49.8 KB

There's no null values in the dataset.

Handling Duplicates¶

Here, we're removing duplicates in the Dataset¶

In [6]:
def handling_duplicates(data):
    if data.duplicated().sum() > 0:
        print(f"There's a {data.duplicated().sum()} duplicated record in the dataset and removed successfully.")
        print("The Data passed the 'duplicates_handling()' and moved to check datatypes of the features")
        data.drop_duplicates(inplace=True)
    else:
        print("[data_transformation.py] While Handling the data, there's no duplicates.") 
    return data
In [7]:
handling_duplicates(data)
There's a 1 duplicated record in the dataset and removed successfully.
The Data passed the 'duplicates_handling()' and moved to check datatypes of the features
Out[7]:
CIC0 SM1_Dz(Z) GATS1i NdsCH NdssC MLOGP LC50
0 3.260 0.829 1.676 0 1 1.453 3.770
1 2.189 0.580 0.863 0 0 1.348 3.115
2 2.125 0.638 0.831 0 0 1.348 3.531
3 3.027 0.331 1.472 1 0 1.807 3.510
4 2.094 0.827 0.860 0 0 1.886 5.390
... ... ... ... ... ... ... ...
903 2.801 0.728 2.226 0 2 0.736 3.109
904 3.652 0.872 0.867 2 3 3.983 4.040
905 3.763 0.916 0.878 0 6 2.918 4.818
906 2.831 1.393 1.077 0 1 0.906 5.317
907 4.057 1.032 1.183 1 3 4.754 8.201

907 rows × 7 columns

Checking Datatypes of Features in the Dataset¶

Here, we're checking the data type of each feature. So, the conflicts in future on features of Dataset will be avoided earlier.¶

In [8]:
def get_check_dtypes(data):
    df_types = pd.DataFrame(data.dtypes)
    df_types.reset_index(inplace=True)
    df_types.rename(columns={'index': 'col_name', 0: 'data_type'}, inplace=True)
    print("Got Datatypes of each column successfully")

    problamatic_column = []
    for i in range(len(df_types)):
        if str(df_types['data_type'][i]).__contains__('int') or str(df_types['data_type'][i]).__contains__('float'):
            pass 
        else:
            problamatic_column.append(df_types['col_name'][i])

    if len(problamatic_column) == 0:
        print("There is no problem with the datatype of each column. The data passed 'get_check_dtypes()' successfully.")
        return data
    else:
        print(f"There is a problem with the datatype of column -> {problamatic_column}")
        print("The data holds non-numeric feature, then the data is not moved further in this project. Please resolve this!!")
In [9]:
get_check_dtypes(data)
Got Datatypes of each column successfully
There is no problem with the datatype of each column. The data passed 'get_check_dtypes()' successfully.
Out[9]:
CIC0 SM1_Dz(Z) GATS1i NdsCH NdssC MLOGP LC50
0 3.260 0.829 1.676 0 1 1.453 3.770
1 2.189 0.580 0.863 0 0 1.348 3.115
2 2.125 0.638 0.831 0 0 1.348 3.531
3 3.027 0.331 1.472 1 0 1.807 3.510
4 2.094 0.827 0.860 0 0 1.886 5.390
... ... ... ... ... ... ... ...
903 2.801 0.728 2.226 0 2 0.736 3.109
904 3.652 0.872 0.867 2 3 3.983 4.040
905 3.763 0.916 0.878 0 6 2.918 4.818
906 2.831 1.393 1.077 0 1 0.906 5.317
907 4.057 1.032 1.183 1 3 4.754 8.201

907 rows × 7 columns

Handling Missing Data¶

Here, we're just trying to handle the missing value. If there's no missing values, no issues the data is revert back in original form¶

In [10]:
# Here, we're just trying to handle the missing value. If there's no missing values, no issues the data is revert back in original form
def handling_missing_values(data):
    try:
        data_dep = data['LC50']
        data_indp = data.drop(['LC50'], axis=1)
        mean_impute_cols = ['CIC0', 'SM1_Dz(Z)', 'GATS1i', 'MLOGP']
        mode_impute_cols = ['NdsCH', 'NdssC']
        transformer = ColumnTransformer(transformers=[
            ("tf1", SimpleImputer(strategy='mean'), mean_impute_cols),
            ("tf2", SimpleImputer(strategy='most_frequent'), mode_impute_cols)
        ])
        trans_data = transformer.fit_transform(data_indp)
        column_names = ['CIC0','SM1_Dz(Z)','GATS1i','MLOGP','NdsCH','NdssC']
        new_data = pd.DataFrame(trans_data, columns=column_names)
        new_data['LC50'] = list(data_dep)
        # Sometimes, there's a chance of duplicates in target variable. So, we've to remove that too
        new_data = new_data.dropna()
        data = pd.DataFrame()
        # Column order is changed due to ColumnTransformer. So, we're reverting back to original form
        for x in column_names:
            data[x] = new_data[x]
        data['LC50'] = list(data_dep)
        print("The data has passed 'handling_missing_values()' successfully.")
        return data
    except:
        print("The data won't received 'handling_missing_values()'. So, please resolve this problem.")
In [11]:
handling_missing_values(data)
The data has passed 'handling_missing_values()' successfully.
C:\Users\harip\anaconda3\lib\site-packages\sklearn\impute\_base.py:49: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
  mode = stats.mode(array)
Out[11]:
CIC0 SM1_Dz(Z) GATS1i MLOGP NdsCH NdssC LC50
0 3.260 0.829 1.676 1.453 0.0 1.0 3.770
1 2.189 0.580 0.863 1.348 0.0 0.0 3.115
2 2.125 0.638 0.831 1.348 0.0 0.0 3.531
3 3.027 0.331 1.472 1.807 1.0 0.0 3.510
4 2.094 0.827 0.860 1.886 0.0 0.0 5.390
... ... ... ... ... ... ... ...
902 2.801 0.728 2.226 0.736 0.0 2.0 3.109
903 3.652 0.872 0.867 3.983 2.0 3.0 4.040
904 3.763 0.916 0.878 2.918 0.0 6.0 4.818
905 2.831 1.393 1.077 0.906 0.0 1.0 5.317
906 4.057 1.032 1.183 4.754 1.0 3.0 8.201

907 rows × 7 columns

From the above heatmap, we know that 'MLOGP' is the most important independent variable(having high postive correlation = 0.65), then 'SM1_Dz(Z)' having postive correlation of 0.41, then 'GATS1i' having negative correlation of -3.9. Rest of the features having less correlation with repect to the target variable (LC50).¶

Explorartory Data Analysis¶

In [12]:
# Scatter plot of feature 'CIC0'
plx.scatter(x = data['CIC0'])
In [13]:
# Scatter plot of feature 'SM1_Dz(Z)'
plx.scatter(x = data['SM1_Dz(Z)'] )
In [14]:
# Scatter plot of feature 'GATS1i'
plx.scatter(x = data['GATS1i'])

NdsCH & NdsCH is a categorical feature, we can check how many of records belongs to each category¶

In [15]:
data['NdsCH'].unique()
Out[15]:
array([0, 1, 3, 2, 4], dtype=int64)
In [16]:
data['NdsCH'].value_counts()
Out[16]:
0    759
1    107
2     29
4      7
3      5
Name: NdsCH, dtype: int64
In [17]:
ndsch_df = pd.DataFrame(data['NdsCH'].value_counts())
In [18]:
ndsch_df
Out[18]:
NdsCH
0 759
1 107
2 29
4 7
3 5
In [19]:
ndsch_df = ndsch_df.rename_axis('category').reset_index()
In [20]:
ndsch_df
Out[20]:
category NdsCH
0 0 759
1 1 107
2 2 29
3 4 7
4 3 5
In [21]:
fig = plx.bar(x = ndsch_df['category'], y = ndsch_df['NdsCH'])
fig.update_traces(dict(marker_line_width=0))
fig.show()

Out of 908 record, 760 record belongs to the category 0 of 'NdsCH'

In [22]:
data['NdssC'].unique()
Out[22]:
array([1, 0, 3, 2, 4, 5, 6], dtype=int64)
In [23]:
data['NdssC'].value_counts()
Out[23]:
0    621
1    176
2     81
3     18
4      8
6      2
5      1
Name: NdssC, dtype: int64
In [24]:
ndssc_df = pd.DataFrame(data['NdssC'].value_counts())
In [25]:
ndssc_df
Out[25]:
NdssC
0 621
1 176
2 81
3 18
4 8
6 2
5 1
In [26]:
ndssc_df = ndssc_df.rename_axis('category').reset_index()
In [27]:
ndssc_df
Out[27]:
category NdssC
0 0 621
1 1 176
2 2 81
3 3 18
4 4 8
5 6 2
6 5 1
In [28]:
fig = plx.bar(x = ndssc_df['category'], y = ndssc_df['NdssC'])
fig.update_traces(dict(marker_line_width=0))
fig.show()

Here, 'NdssC' feature has a '0' category as most, then '1' '2' and so on...

In [29]:
# Scatter plot on 'MLOGP' feature
plx.scatter(x = data['MLOGP'])
In [30]:
# Scatter plot on 'LC50' feature
plx.scatter(x = data['LC50'])

Outlier Detection¶

Have to detect outliers before developinga model. Here, there are 6 independent features. Out of 6, 4 are Continuous data. By using box plot, we're checking the presence of outliers¶

In [31]:
plx.box(data, x = data['CIC0'])
In [32]:
plx.box(data, x = data['SM1_Dz(Z)'])
In [33]:
plx.box(data, x = data['GATS1i'])
In [34]:
plx.box(data, x = data['MLOGP'])

All 4 independent features of continuous data having outliers.¶

Here, we're not going to remove any outlier because the dataset is small containing only 908 records..¶

Handling Outliers(Technique used : Quantile based flooring and capping)¶

We can remove outliers when the dataset is large. But, here the dataset having 908 records only. There are soo many options to replace the outlier value. For example we can replace it with mean or median. Here, we used the technique called "Quantile based flooring and Capping". So, the outliers which is less than the lower bound is replace by 10th percentile of the data and the outliers which is greater than upper bound is replaced by 90th percentile of the data.¶

In [35]:
def compute_outlier(data, col):
    values=data[col]
    q1=np.percentile(values,25)
    q3=np.percentile(values,75)
    iqr=q3-q1
    lower_bound=q1-(1.5*iqr)
    upper_bound=q3+(1.5*iqr)
    tenth_percentile=np.percentile(values,10)
    ninetieth_percentile=np.percentile(values,90)
    return tenth_percentile, ninetieth_percentile, lower_bound, upper_bound

def handling_outlier(data):
    to_handle_cols=['CIC0', 'SM1_Dz(Z)', 'GATS1i', 'MLOGP']
    for col in to_handle_cols:
        tenth_percentile, ninetieth_percentile, lower_bound, upper_bound=compute_outlier(data, col)
        data.loc[data[col]<lower_bound, col]=tenth_percentile
        data.loc[data[col]>upper_bound, col]=ninetieth_percentile
    print("Outliers handled successfully in 'handling_outlier()'.")
    return data
In [36]:
handling_outlier(data)
Outliers handled successfully in 'handling_outlier()'.
Out[36]:
CIC0 SM1_Dz(Z) GATS1i NdsCH NdssC MLOGP LC50
0 3.260 0.829 1.676 0 1 1.453 3.770
1 2.189 0.580 0.863 0 0 1.348 3.115
2 2.125 0.638 0.831 0 0 1.348 3.531
3 3.027 0.331 1.472 1 0 1.807 3.510
4 2.094 0.827 0.860 0 0 1.886 5.390
... ... ... ... ... ... ... ...
903 2.801 0.728 2.226 0 2 0.736 3.109
904 3.652 0.872 0.867 2 3 3.983 4.040
905 3.763 0.916 0.878 0 6 2.918 4.818
906 2.831 1.393 1.077 0 1 0.906 5.317
907 4.057 1.032 1.183 1 3 4.754 8.201

907 rows × 7 columns

We can cross check whether the outliers were handled or not!!!¶

1) CIC0¶

In [37]:
plx.box(data['CIC0'])

2) SM1_Dz(Z)¶

In [38]:
plx.box(data['SM1_Dz(Z)'])

3) GATS1i¶

In [39]:
plx.box(data['GATS1i'])

4) MLOGP¶

In [40]:
plx.box(data['MLOGP'])

Short Statistical info¶

describe() method gives us some statistical information of the data like., data count, mean, standard deviation, 25th percentile, 50th percentile, etc....¶

In [41]:
# describe() gives some statistical information of the dataset.
data.describe()
Out[41]:
CIC0 SM1_Dz(Z) GATS1i NdsCH NdssC MLOGP LC50
count 907.000000 907.000000 907.000000 907.000000 907.000000 907.000000 907.000000
mean 2.897714 0.625782 1.287370 0.229327 0.486218 2.112877 4.064723
std 0.741028 0.421883 0.378049 0.605621 0.861603 1.393402 1.456475
min 0.965000 0.000000 0.396000 0.000000 0.000000 -1.358000 0.053000
25% 2.346000 0.223000 0.951000 0.000000 0.000000 1.209000 3.151500
50% 2.937000 0.570000 1.244000 0.000000 0.000000 2.127000 3.991000
75% 3.407000 0.894500 1.562500 0.000000 1.000000 3.105000 4.909000
max 4.880000 1.860000 2.456000 4.000000 6.000000 5.934000 9.612000

Correlation¶

In [42]:
# Correlation gives how every attribute is important with each other
plx.imshow(data.corr(), text_auto = True, height = 700, width = 700)

From the above correlation graph we got some useful insights. They are,¶

1) The feature 'MLOGP' is the most important feature for the target variable 'LC50' with correlation of 0.6580¶

2) Next to 'MLOGP', 'SM1_Dz(Z)' is the important feature for target variable 'LC50' with correlation of 0.4119¶

3) The feature 'GATS1i' is negatively correlated, not that much high. The correlation value is -0.3878¶

4) 'NdsCH' and 'NdssC' are the most least important feature with respect to target 'LC50'¶

Dimensionality Reduction¶

We can remove any one the independent features which is highly correlated with other independent feature.¶

In [43]:
def dimensionality_reduction(data, threshold):
    # We, get threshold from user. So, that the feature is removed above the threshold.
    temp_data = data.drop(['LC50'], axis=1)
    corr_columns = set() # Here, the data structure 'set()' is used avoid the duplicate column names.
    corr_matrix = temp_data.corr() # object.corr() returns the correlation matrix of the dataset.
    # This for loop coves only the left bottom of correlation table. So, 
    for i in range(len(corr_matrix.columns)):
        for j in range(i):
            # If the correlation is greater than threshold, the column name is added to the set 'corr_columns'.
            if corr_matrix.iloc[i,j] > threshold:  
                column_name = corr_matrix.columns[i]
                corr_columns.add(column_name)
    data.drop(list(corr_columns), axis=1, inplace=True)
    if len(list(corr_columns)) == 0:
        print("Dimensionality Reduction not happened because of low correlation between independent features.")
    else:
        print("Dimensionality reduction successfully completed.")
    return data
In [44]:
dimensionality_reduction(data, 0.85)
Dimensionality Reduction not happened because of low correlation between independent features.
Out[44]:
CIC0 SM1_Dz(Z) GATS1i NdsCH NdssC MLOGP LC50
0 3.260 0.829 1.676 0 1 1.453 3.770
1 2.189 0.580 0.863 0 0 1.348 3.115
2 2.125 0.638 0.831 0 0 1.348 3.531
3 3.027 0.331 1.472 1 0 1.807 3.510
4 2.094 0.827 0.860 0 0 1.886 5.390
... ... ... ... ... ... ... ...
903 2.801 0.728 2.226 0 2 0.736 3.109
904 3.652 0.872 0.867 2 3 3.983 4.040
905 3.763 0.916 0.878 0 6 2.918 4.818
906 2.831 1.393 1.077 0 1 0.906 5.317
907 4.057 1.032 1.183 1 3 4.754 8.201

907 rows × 7 columns

Splitting Data into Train and Test set¶

In [45]:
from sklearn.model_selection import train_test_split
x_train, x_test,y_train, y_test = train_test_split(data.drop(['LC50'], axis=1), data['LC50'], test_size=0.2)

Functions¶

In [46]:
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, training_time, prediction_time = [],[],[],[],[],[],[],[],[],[],[]
In [47]:
# Using 'time' package, we're capturing the training and prediction time of the model
def model_trainer(model, x_train, y_train, x_test):
    start_time = time.time()
    model = model.fit(x_train, y_train)
    training_time = time.time()-start_time
    start_time = time.time()
    pred_y_test = model.predict(x_test)
    prediction_time = time.time()-start_time
    pred_y_train = model.predict(x_train)
    return pred_y_test, pred_y_train, training_time, prediction_time
In [48]:
def calculate_error_range(pred, y_test):
        error=[]
        zero_to_one,one_to_two,two_to_three,greater_than_three=0,0,0,0
        for i in range(len(y_test)):
            error.append(abs(pred[i]-list(y_test)[i]))
        for x in error:
            if (x>=0) & (x<=1):
                zero_to_one+=1
            elif (x>1) & (x<=2):
                one_to_two+=1
            elif (x>2) & (x<=3):
                two_to_three+=1
            elif (x>3):
                greater_than_three+=1
        return error, zero_to_one, one_to_two, two_to_three, greater_than_three
In [49]:
def visualize_error(error, modelname):
    # Plotting those absolute error values in scatter plot gives us an idea that how modelis performing in the test data.
    # Code to check in which error range the predicted values lies. The ranges are 0-1, 1-2, 2-3 and >3.
    zero_to_one,zero_to_one_idx, one_to_two,one_to_two_idx, two_to_three,two_to_three_idx, grt_than_three,grt_than_three_idx = [],[],[],[],[],[],[],[]
    for i in range(len(error)):
        if (error[i]>=0) & (error[i]<=1):
            zero_to_one.append(error[i])
            zero_to_one_idx.append(i)
        elif (error[i]>1) & (error[i]<=2):
            one_to_two.append(error[i])
            one_to_two_idx.append(i)
        elif (error[i]>2) & (error[i]<=3):
            two_to_three.append(error[i])
            two_to_three_idx.append(i)
        elif error[i]>3:
            grt_than_three.append(error[i])
            grt_than_three_idx.append(i)

    fig = go.Figure()
    fig.add_trace(go.Scatter(y=zero_to_one,x=zero_to_one_idx,
                        mode='markers',
                        name='0 to 1'))
    fig.add_trace(go.Scatter(y=one_to_two,x=one_to_two_idx,
                        mode='markers',
                        name='1 to 2'))
    fig.add_trace(go.Scatter(y=two_to_three,x=two_to_three_idx,
                        mode='markers',
                        name='2 to 3'))
    fig.add_trace(go.Scatter(y=grt_than_three,x=grt_than_three_idx,
                        mode='markers',
                        name='>3'))
    fig.update_layout(
        title=f"Absolute Difference b/w Predicted and Actual value of {modelname}",
        xaxis_title="Index",
        yaxis_title="Error",
        legend_title_text="Error Range",
        font=dict(
            family="Courier New, monospace",
            size=13,
            color="RebeccaPurple"
        )
    )
    fig.show()
In [50]:
def visulaize_performance_of_the_model(pred, y_test, modelname):
    # Plotting both line & scatter plot in same graph of predicted values to check the performance of the model in visualization.
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=np.arange(0,11), y=np.arange(0,11),
                             mode='lines',
                             name='perfectline'))
    fig.add_trace(go.Scatter(x=pred, y=y_test,
                             mode='markers',
                             name='predictions'))
    fig.update_layout(
        title=f"Performance of {modelname} on Test data",
        xaxis_title="Predicted",
        yaxis_title="Actual",
        font=dict(
            family="Courier New, monospace",
            size=13,
            color="RebeccaPurple"
        )
    )
    fig.show()
In [51]:
def visualize_prediction_on_traindata(pred_y_train, y_train ,modelname):
    # To check whether the model is overfitted or not, we're predicting the target value for train set and visualized like above plot.
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=np.arange(0,11), y=np.arange(0,11),
                             mode='lines',
                             name='perfectline'))
    fig.add_trace(go.Scatter(x=pred_y_train, y=y_train,
                             mode='markers',
                             name='predictions'))
    fig.update_layout(
        title=f"{modelname} model on Training data to check Overfitting",
        xaxis_title="Predicted",
        yaxis_title="Actual",
        font=dict(
            family="Courier New, monospace",
            size=13,
            color="RebeccaPurple"
        )
    )
    fig.show()
In [52]:
def recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time):
    # details like modelname, error range, scores of the model(r2, rmse, mse, mae),prediction time, training time were recorded.
    model_name.append(modelname)
    r2_score_.append(r2_score(y_test, pred_y_test))
    mae.append(mean_absolute_error(y_test, pred_y_test))
    mse.append(mean_squared_error(y_test, pred_y_test))
    rmse.append(np.sqrt(mean_squared_error(y_test, pred_y_test)))
    error_0_to_1.append(zero_to_one)
    error_1_to_2.append(one_to_two)
    error_2_to_3.append(two_to_three)
    error_greater_than_3.append(greater_than_three)
    training_time.append(trn_time)
    prediction_time.append(pred_time)

Multiple Linear Regression¶

In [53]:
# Importing LinearRegression from sklearn.linear_model
from sklearn.linear_model import LinearRegression 
In [54]:
modelname = 'MultipleLinearRegression'
In [55]:
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(LinearRegression(), x_train, y_train, x_test)
In [56]:
trn_time
Out[56]:
0.009304046630859375
In [57]:
pred_time
Out[57]:
0.0009984970092773438
In [58]:
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
In [59]:
zero_to_one, one_to_two, two_to_three, greater_than_three
Out[59]:
(144, 24, 9, 5)
In [60]:
visualize_error(error, modelname)

The above scatter plots shows that most of the predicted values having error from 0 to 1(blue plots). Some of Error lies between 1 and 2(red plots), few of the between 2 and 3(green plots). Only few at the top having error more than 3.0¶

In [61]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

This model is a good prediction, the predictions(red markers) are pointed closer to the blue line(perfect line). At the same time, we have to check whether the model is overfitted with training set or not...¶

In [62]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

From the graph itself we can say that, this model is not overfitted and this is a generalised model. So, we can consider this model for final evaluation.¶

In [63]:
recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time)
In [64]:
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, trn_time, pred_time
Out[64]:
(['MultipleLinearRegression'],
 [0.5453714270668548],
 [0.7425829252963528],
 [1.0377868868311382],
 [1.0770016224786658],
 [144],
 [24],
 [9],
 [5],
 0.009304046630859375,
 0.0009984970092773438)

Ridge Regressor¶

In [65]:
# Importing Ridge package from sklearn.linear_model
from sklearn.linear_model import Ridge
In [66]:
modelname = 'RidgeRegression'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(Ridge(), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)

The above scatter plots shows that most of the predicted values having error from 0 to 1(blue plots). Some of Error lies between 1 and 2(red plots), few of the between 2 and 3(green plots). Some of points at the top having error more than 3.0¶

In [67]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

This model is quite good in prediction, only one point having a big error. It looks same as the graph from Linear Regression. At the same time, we have to check whether the model is overfitted with training set or not...¶

In [68]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

From the graph itself we can say that, this model is not overfitted and this is a generalised model. So, we can consider this model for final evaluation.¶

In [69]:
recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time)
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, trn_time, pred_time
Out[69]:
(['MultipleLinearRegression', 'RidgeRegression'],
 [0.5453714270668548, 0.5445069168183965],
 [0.7425829252963528, 0.7431269740195243],
 [1.0377868868311382, 1.0387731329068615],
 [1.0770016224786658, 1.0790496216491359],
 [144, 144],
 [24, 24],
 [9, 9],
 [5, 5],
 0.002550840377807617,
 0.0012249946594238281)

Hyper-parameter Tuning on Ridge Regressor¶

In [70]:
"""
RandomizedSearchCV is used for hyper-parameter tuning. This 'RandomizedSearchCV' picks appropriate parameter and 
gives us best out of it.
"""
from sklearn.model_selection import RandomizedSearchCV
In [71]:
# A dictionary 'params' having some important parameter as keys which holds list of different values
params = {
    'alpha' : [0, 1e-10,1e-9,1e-8,1e-7,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1,5,10,15,20,25,30,35,40,50,60,70,80,90,100,150,200],
    'max_iter' : [100, 200,300,400, 500,600,700,800,900, 1000, 1500, 2000],
    'solver' : ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs']
         }
In [72]:
"""
The Ridge() is assigned as an estimator, 'params' dictionary is assigned as an param_distributions and 
RandomizedSearchCV and 'r2' metric is assigned as a scoring metric of 'RandomizedSearchCV()' and stored in a variable
'ridge_tuned'
"""
ridge_tuned = RandomizedSearchCV(Ridge(), param_distributions=params, scoring = 'r2')
In [73]:
# Have to fit x_train and y_train for the variable 'ridge_tuned'
ridge_tuned.fit(x_train, y_train)
C:\Users\harip\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:372: FitFailedWarning:


5 fits failed out of a total of 50.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
5 fits failed with the following error:
Traceback (most recent call last):
  File "C:\Users\harip\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\harip\anaconda3\lib\site-packages\sklearn\linear_model\_ridge.py", line 1011, in fit
    return super().fit(X, y, sample_weight=sample_weight)
  File "C:\Users\harip\anaconda3\lib\site-packages\sklearn\linear_model\_ridge.py", line 705, in fit
    raise ValueError(
ValueError: 'lbfgs' solver can be used only when positive=True. Please use another solver.


C:\Users\harip\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning:

One or more of the test scores are non-finite: [0.54797433 0.58181972 0.55517176 0.57852306 0.58177985 0.55147954
 0.57430821 0.54797433 0.54797433        nan]

Out[73]:
RandomizedSearchCV(estimator=Ridge(),
                   param_distributions={'alpha': [0, 1e-10, 1e-09, 1e-08, 1e-07,
                                                  1e-06, 1e-05, 0.0001, 0.001,
                                                  0.01, 0.1, 1, 5, 10, 15, 20,
                                                  25, 30, 35, 40, 50, 60, 70,
                                                  80, 90, 100, 150, 200],
                                        'max_iter': [100, 200, 300, 400, 500,
                                                     600, 700, 800, 900, 1000,
                                                     1500, 2000],
                                        'solver': ['auto', 'svd', 'cholesky',
                                                   'lsqr', 'sparse_cg', 'sag',
                                                   'saga', 'lbfgs']},
                   scoring='r2')
In [74]:
# To get best parameter of tuned Ridge Regressor
ridge_tuned.best_params_
Out[74]:
{'solver': 'saga', 'max_iter': 700, 'alpha': 1e-07}
In [75]:
modelname = 'RidgeRegression Tuned'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(Ridge().set_params(**ridge_tuned.best_params_), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)

The plot looks same as the un-tuned Ridge Regression.¶

In [76]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

This model is quite good in prediction, only one point having a big error. At the same time, we have to check whether the model is overfitted with training set or not...¶

In [77]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

From the graph itself we can say that, this model is not overfitted and this is a generalised model. So, we can consider this model for final evaluation. Not that much difference after tuning.¶

In [78]:
recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time)
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, trn_time, pred_time
Out[78]:
(['MultipleLinearRegression', 'RidgeRegression', 'RidgeRegression Tuned'],
 [0.5453714270668548, 0.5445069168183965, 0.5453700403294548],
 [0.7425829252963528, 0.7431269740195243, 0.742606843045171],
 [1.0377868868311382, 1.0387731329068615, 1.0377884695921848],
 [1.0770016224786658, 1.0790496216491359, 1.077004907618489],
 [144, 144, 144],
 [24, 24, 24],
 [9, 9, 9],
 [5, 5, 5],
 0.0030221939086914062,
 0.0010066032409667969)

Lasso Regression¶

In [79]:
# Importing Lasso package from sklearn.linear_model
from sklearn.linear_model import Lasso
In [80]:
modelname = 'LassoRegression'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(Lasso(), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)

This is quite bad than Multiple Linear Regression. Too many records giving error more than 3.0¶

In [81]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

The prediction is not good. We have to tune or avoid this Lasso Algorithm.¶

In [82]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

The Lasso Algorithm not properly learned the training set itself. That's why the prection is bad.¶

This Lasso model is not considered for final evaluation because both learning and prediction are poor when fitted with default parameters of Lasso Regressor. We can check the performance of the model after tuning some of the parameters.¶

Hyper-parameter Tuning on Lasso Regression¶

In [83]:
modelname = 'LassoRegression Tuned'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(Lasso(alpha = 0.1, selection = 'random'), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)

Here, we adjusted the aplha value from 1.0 to 0.01. Setting alpha as zero means that linear regression and it is not advised. So, here we chosen smaller alpha value of 0.01. And, Setting selection as 'random' often leads to significantly faster convergence. So we set 'random' for 'selection' parameter.¶

In [84]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

After changing some of the parameters, we got the predictions closer to the blue line. The predictions are good and looks like output from Multiple Linear Regression. Most important is that we have to check whether the model is overfitted with training data or not !!!¶

In [85]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

From the above plot we confirm that the model is not overfitted with training data. So, we can consider this model for final evaluation. From the above graphs we noticed that the tuning improves learning and predictions.¶

In [86]:
recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time)
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, trn_time, pred_time
Out[86]:
(['MultipleLinearRegression',
  'RidgeRegression',
  'RidgeRegression Tuned',
  'LassoRegression Tuned'],
 [0.5453714270668548,
  0.5445069168183965,
  0.5453700403294548,
  0.4125003185488286],
 [0.7425829252963528,
  0.7431269740195243,
  0.742606843045171,
  0.8318949203719893],
 [1.0377868868311382,
  1.0387731329068615,
  1.0377884695921848,
  1.1797327420693962],
 [1.0770016224786658,
  1.0790496216491359,
  1.077004907618489,
  1.3917693427105766],
 [144, 144, 144, 134],
 [24, 24, 24, 29],
 [9, 9, 9, 11],
 [5, 5, 5, 8],
 0.0015027523040771484,
 0.0010752677917480469)

Random Forest Regressor¶

In [87]:
# Importing RandomForestRegressor from sklearn.ensemble
from sklearn.ensemble import RandomForestRegressor
In [88]:
modelname = 'RandomForestRegressor'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(RandomForestRegressor(), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)

From the above scatter plot, we can see that most of the plots ranges between 0 and 1. Some of the values inbetween 1 and 2, few of them inbetween 2 and 3. Only few points which is greater than 3.0¶

In [89]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

RandomForestRegressor is good at prediction. At the same time, we have to check whether the model is overfitted with training set or not.¶

In [90]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

From, the above graph, we can say that the predictions are on and around prefectline. Here, most of the predictions are around the perfectline. So, we can't say this as overfitted on training set. At the same time, we have look at the the performance on test data. In that, only few points are away from the perfectline. So, we can't consider this as overfitted model. We can consider this for final evaluation.¶

In [91]:
recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time)
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, trn_time, pred_time
Out[91]:
(['MultipleLinearRegression',
  'RidgeRegression',
  'RidgeRegression Tuned',
  'LassoRegression Tuned',
  'RandomForestRegressor'],
 [0.5453714270668548,
  0.5445069168183965,
  0.5453700403294548,
  0.4125003185488286,
  0.5962632916868726],
 [0.7425829252963528,
  0.7431269740195243,
  0.742606843045171,
  0.8318949203719893,
  0.6757346611191009],
 [1.0377868868311382,
  1.0387731329068615,
  1.0377884695921848,
  1.1797327420693962,
  0.9779776596065038],
 [1.0770016224786658,
  1.0790496216491359,
  1.077004907618489,
  1.3917693427105766,
  0.9564403026894147],
 [144, 144, 144, 134, 148],
 [24, 24, 24, 29, 21],
 [9, 9, 9, 11, 11],
 [5, 5, 5, 8, 2],
 0.17693495750427246,
 0.009001970291137695)

Hyper-parameter Tuning on RandomForestRegressor¶

In [92]:
modelname = 'RandomForestRegressor Tuned'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(RandomForestRegressor(min_samples_split=15, min_samples_leaf=10, max_depth=10), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)
In [93]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

The predictions are good. Only one point which is too far away from perfect line.¶

In [94]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

From the above scatter plot the predictions are on the perfect line as well as around the perfect line. So, this is not a overfitted model. But, this is quite bad in learning as well as in predictions of RandomForestRegressor with default parameters. We can consider this for final evaluation.¶

In [95]:
recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time)
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, trn_time, pred_time
Out[95]:
(['MultipleLinearRegression',
  'RidgeRegression',
  'RidgeRegression Tuned',
  'LassoRegression Tuned',
  'RandomForestRegressor',
  'RandomForestRegressor Tuned'],
 [0.5453714270668548,
  0.5445069168183965,
  0.5453700403294548,
  0.4125003185488286,
  0.5962632916868726,
  0.5523520343191741],
 [0.7425829252963528,
  0.7431269740195243,
  0.742606843045171,
  0.8318949203719893,
  0.6757346611191009,
  0.7300144780474158],
 [1.0377868868311382,
  1.0387731329068615,
  1.0377884695921848,
  1.1797327420693962,
  0.9779776596065038,
  1.0297887019550571],
 [1.0770016224786658,
  1.0790496216491359,
  1.077004907618489,
  1.3917693427105766,
  0.9564403026894147,
  1.0604647706742814],
 [144, 144, 144, 134, 148, 145],
 [24, 24, 24, 29, 21, 25],
 [9, 9, 9, 11, 11, 7],
 [5, 5, 5, 8, 2, 5],
 0.12148642539978027,
 0.007001161575317383)

SVR(Support Vector Regressor)¶

In [96]:
# Importing SupportVectorRegressor(SVR) from sklearn.svm
from sklearn.svm import SVR
In [97]:
modelname = 'SupportVectorRegressor'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(SVR(), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)

The predictions are good, only one data given an error greater than 3.0¶

Further we have to check with visualization on both training and testing set.¶

In [98]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

From the above scatter plot, we can see that the predictions are on and around the perfect line. Only one point that is far away from perfect line.¶

In [99]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

From the avove scatter plot we confirm that the model is not overfitted on training data. We can consider this model for final evaluation.¶

We got generalized model with default parameters. So, hyper-parameter tuning not done on SVR¶

In [100]:
recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time)
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, trn_time, pred_time
Out[100]:
(['MultipleLinearRegression',
  'RidgeRegression',
  'RidgeRegression Tuned',
  'LassoRegression Tuned',
  'RandomForestRegressor',
  'RandomForestRegressor Tuned',
  'SupportVectorRegressor'],
 [0.5453714270668548,
  0.5445069168183965,
  0.5453700403294548,
  0.4125003185488286,
  0.5962632916868726,
  0.5523520343191741,
  0.5657549065097839],
 [0.7425829252963528,
  0.7431269740195243,
  0.742606843045171,
  0.8318949203719893,
  0.6757346611191009,
  0.7300144780474158,
  0.6912996633539209],
 [1.0377868868311382,
  1.0387731329068615,
  1.0377884695921848,
  1.1797327420693962,
  0.9779776596065038,
  1.0297887019550571,
  1.0142552752973413],
 [1.0770016224786658,
  1.0790496216491359,
  1.077004907618489,
  1.3917693427105766,
  0.9564403026894147,
  1.0604647706742814,
  1.0287137634684858],
 [144, 144, 144, 134, 148, 145, 149],
 [24, 24, 24, 29, 21, 25, 19],
 [9, 9, 9, 11, 11, 7, 9],
 [5, 5, 5, 8, 2, 5, 5],
 0.023962974548339844,
 0.00800180435180664)

KNeighborsRegressor¶

In [101]:
# Importing KNeighborsRegressor from sklearn.neighbors
from sklearn.neighbors import KNeighborsRegressor
In [102]:
# Choosing optimal 'K' value
knr_errors = []
for i in range(2,50):
    knr = KNeighborsRegressor(n_neighbors=i)
    knr.fit(x_train, y_train)
    pred_knr = knr.predict(x_test)
    error = []
    for j in range(0,len(x_test)):
        err = pred_knr[j] - list(y_test)[j]
        error.append(abs(err))
    knr_errors.append(np.mean(error))
In [103]:
# Line plot shows from which k value the errors are stable.
plx.line(knr_errors)

The Error is stable between the 'k' value of 6 and 10. So we can choose k value as 8.¶

In [104]:
modelname = 'KNeighborsRegressor'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(KNeighborsRegressor(n_neighbors=8), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)
In [105]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

The above scatter plot shows the goodness of the prediction.¶

kNeighborsRegressor is good at prediction. We have to check whether the model is overfitted with training data or not !!!¶

In [106]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

From the above scatter plot we confirm that the model is not overfitted with the training data. And we can consider this model for final evaluation.¶

In [107]:
recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time)
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, trn_time, pred_time
Out[107]:
(['MultipleLinearRegression',
  'RidgeRegression',
  'RidgeRegression Tuned',
  'LassoRegression Tuned',
  'RandomForestRegressor',
  'RandomForestRegressor Tuned',
  'SupportVectorRegressor',
  'KNeighborsRegressor'],
 [0.5453714270668548,
  0.5445069168183965,
  0.5453700403294548,
  0.4125003185488286,
  0.5962632916868726,
  0.5523520343191741,
  0.5657549065097839,
  0.5133423844796292],
 [0.7425829252963528,
  0.7431269740195243,
  0.742606843045171,
  0.8318949203719893,
  0.6757346611191009,
  0.7300144780474158,
  0.6912996633539209,
  0.7381497252747253],
 [1.0377868868311382,
  1.0387731329068615,
  1.0377884695921848,
  1.1797327420693962,
  0.9779776596065038,
  1.0297887019550571,
  1.0142552752973413,
  1.0737213212086887],
 [1.0770016224786658,
  1.0790496216491359,
  1.077004907618489,
  1.3917693427105766,
  0.9564403026894147,
  1.0604647706742814,
  1.0287137634684858,
  1.152877475618132],
 [144, 144, 144, 134, 148, 145, 149, 144],
 [24, 24, 24, 29, 21, 25, 19, 23],
 [9, 9, 9, 11, 11, 7, 9, 9],
 [5, 5, 5, 8, 2, 5, 5, 6],
 0.0012378692626953125,
 0.0020029544830322266)

XGBoost¶

In [108]:
# Importing XGBRegressor from xgboost
from xgboost import XGBRegressor
In [109]:
modelname = 'XGBRegressor'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(XGBRegressor(), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)

Most of values lies in bettween 0 and 1, Some of the values inbetween 1 and 2 and few of them greater than 2.¶

In [110]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

Most of the red markers lies around the blue line(perfectline). So, this will be good model. At the same time we have to check whether the model is overfitted on training set or not !!!¶

In [111]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

Yes, the model is highly overfitted on training set. We have to fine tune the model by changing the values in some of the parameters of XGBRegressor.¶

Hyper-parameter Tuning on XGBRegressor¶

In [112]:
modelname = 'XGBRegressor Tuned'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(XGBRegressor(n_estimators = 12, max_depth = 7), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)

This model looks good in error rate. Let's check the performance on both training and testing data.¶

In [113]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

The predictions are good.¶

In [114]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

This looks better than the model of XGBRegressor with default parameters. XGBRegressor is quite unstable while changing the parameter 'n_estimators' which is nothing but the Number of Gradient boosted trees. So, we're not going to consider XGBRegressor model for final evaluation.¶

GradientBoostingRegressor¶

In [115]:
from sklearn.ensemble import GradientBoostingRegressor, AdaBoostRegressor
In [116]:
modelname = 'GradientBoosting Regressor'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(GradientBoostingRegressor(), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)

Most of the error is ranges between 0 and 1¶

In [117]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

From the above scatter plot, we know that predictions are on and around the perfect line.¶

In [118]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

From the above scatter plot, we confirm that the model is not overfitted on the training set.¶

In [119]:
recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time)
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, trn_time, pred_time
Out[119]:
(['MultipleLinearRegression',
  'RidgeRegression',
  'RidgeRegression Tuned',
  'LassoRegression Tuned',
  'RandomForestRegressor',
  'RandomForestRegressor Tuned',
  'SupportVectorRegressor',
  'KNeighborsRegressor',
  'GradientBoosting Regressor'],
 [0.5453714270668548,
  0.5445069168183965,
  0.5453700403294548,
  0.4125003185488286,
  0.5962632916868726,
  0.5523520343191741,
  0.5657549065097839,
  0.5133423844796292,
  0.6292510454065448],
 [0.7425829252963528,
  0.7431269740195243,
  0.742606843045171,
  0.8318949203719893,
  0.6757346611191009,
  0.7300144780474158,
  0.6912996633539209,
  0.7381497252747253,
  0.6832170421355467],
 [1.0377868868311382,
  1.0387731329068615,
  1.0377884695921848,
  1.1797327420693962,
  0.9779776596065038,
  1.0297887019550571,
  1.0142552752973413,
  1.0737213212086887,
  0.9371730316626096],
 [1.0770016224786658,
  1.0790496216491359,
  1.077004907618489,
  1.3917693427105766,
  0.9564403026894147,
  1.0604647706742814,
  1.0287137634684858,
  1.152877475618132,
  0.8782932912756866],
 [144, 144, 144, 134, 148, 145, 149, 144, 145],
 [24, 24, 24, 29, 21, 25, 19, 23, 29],
 [9, 9, 9, 11, 11, 7, 9, 9, 5],
 [5, 5, 5, 8, 2, 5, 5, 6, 3],
 0.6686649322509766,
 0.011003732681274414)

AdaBoostRegressor¶

In [120]:
modelname = 'AdaBoostRegressor Tuned'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(AdaBoostRegressor(), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)
In [121]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

From the above scatter plot, we know that the model is unstable.¶

In [122]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

This model is unstable. So, this is not considered.¶

CatBoostRegressor¶

In [123]:
from catboost import CatBoostRegressor
In [124]:
modelname = 'CatBoostRegressor'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(CatBoostRegressor(), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)
Learning rate set to 0.038915
0:	learn: 1.4036336	total: 150ms	remaining: 2m 29s
1:	learn: 1.3773370	total: 159ms	remaining: 1m 19s
2:	learn: 1.3550577	total: 168ms	remaining: 55.8s
3:	learn: 1.3311160	total: 177ms	remaining: 44.1s
4:	learn: 1.3090782	total: 186ms	remaining: 37s
5:	learn: 1.2864216	total: 197ms	remaining: 32.6s
6:	learn: 1.2648077	total: 206ms	remaining: 29.2s
7:	learn: 1.2464545	total: 215ms	remaining: 26.6s
8:	learn: 1.2269393	total: 223ms	remaining: 24.6s
9:	learn: 1.2092730	total: 231ms	remaining: 22.9s
10:	learn: 1.1920413	total: 240ms	remaining: 21.6s
11:	learn: 1.1769026	total: 252ms	remaining: 20.7s
12:	learn: 1.1600679	total: 277ms	remaining: 21s
13:	learn: 1.1451340	total: 294ms	remaining: 20.7s
14:	learn: 1.1310558	total: 306ms	remaining: 20.1s
15:	learn: 1.1174451	total: 316ms	remaining: 19.4s
16:	learn: 1.1039020	total: 324ms	remaining: 18.8s
17:	learn: 1.0898188	total: 334ms	remaining: 18.2s
18:	learn: 1.0778649	total: 343ms	remaining: 17.7s
19:	learn: 1.0662782	total: 351ms	remaining: 17.2s
20:	learn: 1.0579072	total: 360ms	remaining: 16.8s
21:	learn: 1.0473111	total: 369ms	remaining: 16.4s
22:	learn: 1.0378219	total: 377ms	remaining: 16s
23:	learn: 1.0270119	total: 388ms	remaining: 15.8s
24:	learn: 1.0175178	total: 398ms	remaining: 15.5s
25:	learn: 1.0073605	total: 406ms	remaining: 15.2s
26:	learn: 0.9986640	total: 415ms	remaining: 14.9s
27:	learn: 0.9897038	total: 423ms	remaining: 14.7s
28:	learn: 0.9823848	total: 432ms	remaining: 14.5s
29:	learn: 0.9738911	total: 441ms	remaining: 14.3s
30:	learn: 0.9660693	total: 449ms	remaining: 14s
31:	learn: 0.9590756	total: 458ms	remaining: 13.8s
32:	learn: 0.9521866	total: 466ms	remaining: 13.7s
33:	learn: 0.9449102	total: 475ms	remaining: 13.5s
34:	learn: 0.9382957	total: 483ms	remaining: 13.3s
35:	learn: 0.9327270	total: 492ms	remaining: 13.2s
36:	learn: 0.9259073	total: 500ms	remaining: 13s
37:	learn: 0.9191474	total: 509ms	remaining: 12.9s
38:	learn: 0.9139108	total: 518ms	remaining: 12.8s
39:	learn: 0.9090267	total: 526ms	remaining: 12.6s
40:	learn: 0.9043968	total: 534ms	remaining: 12.5s
41:	learn: 0.8997036	total: 543ms	remaining: 12.4s
42:	learn: 0.8953105	total: 551ms	remaining: 12.3s
43:	learn: 0.8918872	total: 559ms	remaining: 12.2s
44:	learn: 0.8868644	total: 568ms	remaining: 12.1s
45:	learn: 0.8823289	total: 577ms	remaining: 12s
46:	learn: 0.8782010	total: 587ms	remaining: 11.9s
47:	learn: 0.8733925	total: 598ms	remaining: 11.9s
48:	learn: 0.8693107	total: 606ms	remaining: 11.8s
49:	learn: 0.8652872	total: 615ms	remaining: 11.7s
50:	learn: 0.8610376	total: 624ms	remaining: 11.6s
51:	learn: 0.8570110	total: 633ms	remaining: 11.5s
52:	learn: 0.8534998	total: 642ms	remaining: 11.5s
53:	learn: 0.8492604	total: 651ms	remaining: 11.4s
54:	learn: 0.8455425	total: 660ms	remaining: 11.3s
55:	learn: 0.8416002	total: 671ms	remaining: 11.3s
56:	learn: 0.8387442	total: 681ms	remaining: 11.3s
57:	learn: 0.8354203	total: 692ms	remaining: 11.2s
58:	learn: 0.8323108	total: 701ms	remaining: 11.2s
59:	learn: 0.8298020	total: 710ms	remaining: 11.1s
60:	learn: 0.8272908	total: 719ms	remaining: 11.1s
61:	learn: 0.8242996	total: 727ms	remaining: 11s
62:	learn: 0.8219507	total: 736ms	remaining: 10.9s
63:	learn: 0.8197634	total: 744ms	remaining: 10.9s
64:	learn: 0.8165851	total: 753ms	remaining: 10.8s
65:	learn: 0.8140481	total: 762ms	remaining: 10.8s
66:	learn: 0.8113286	total: 770ms	remaining: 10.7s
67:	learn: 0.8092164	total: 779ms	remaining: 10.7s
68:	learn: 0.8067372	total: 791ms	remaining: 10.7s
69:	learn: 0.8042983	total: 801ms	remaining: 10.6s
70:	learn: 0.8024474	total: 809ms	remaining: 10.6s
71:	learn: 0.8005939	total: 818ms	remaining: 10.5s
72:	learn: 0.7986112	total: 827ms	remaining: 10.5s
73:	learn: 0.7955839	total: 835ms	remaining: 10.5s
74:	learn: 0.7938918	total: 844ms	remaining: 10.4s
75:	learn: 0.7920774	total: 852ms	remaining: 10.4s
76:	learn: 0.7900934	total: 861ms	remaining: 10.3s
77:	learn: 0.7886788	total: 870ms	remaining: 10.3s
78:	learn: 0.7871182	total: 879ms	remaining: 10.2s
79:	learn: 0.7844637	total: 887ms	remaining: 10.2s
80:	learn: 0.7830511	total: 896ms	remaining: 10.2s
81:	learn: 0.7808360	total: 905ms	remaining: 10.1s
82:	learn: 0.7793787	total: 915ms	remaining: 10.1s
83:	learn: 0.7778158	total: 924ms	remaining: 10.1s
84:	learn: 0.7758569	total: 933ms	remaining: 10s
85:	learn: 0.7747979	total: 941ms	remaining: 10s
86:	learn: 0.7737043	total: 951ms	remaining: 9.98s
87:	learn: 0.7720019	total: 959ms	remaining: 9.94s
88:	learn: 0.7696142	total: 968ms	remaining: 9.91s
89:	learn: 0.7687651	total: 977ms	remaining: 9.88s
90:	learn: 0.7668306	total: 986ms	remaining: 9.85s
91:	learn: 0.7655746	total: 996ms	remaining: 9.83s
92:	learn: 0.7641319	total: 1s	remaining: 9.81s
93:	learn: 0.7631797	total: 1.01s	remaining: 9.78s
94:	learn: 0.7623033	total: 1.02s	remaining: 9.74s
95:	learn: 0.7601529	total: 1.03s	remaining: 9.71s
96:	learn: 0.7586195	total: 1.04s	remaining: 9.68s
97:	learn: 0.7572859	total: 1.05s	remaining: 9.65s
98:	learn: 0.7545825	total: 1.06s	remaining: 9.62s
99:	learn: 0.7530004	total: 1.06s	remaining: 9.59s
100:	learn: 0.7514290	total: 1.07s	remaining: 9.56s
101:	learn: 0.7504664	total: 1.08s	remaining: 9.53s
102:	learn: 0.7491391	total: 1.09s	remaining: 9.5s
103:	learn: 0.7480918	total: 1.1s	remaining: 9.47s
104:	learn: 0.7468230	total: 1.11s	remaining: 9.44s
105:	learn: 0.7459686	total: 1.11s	remaining: 9.41s
106:	learn: 0.7445108	total: 1.12s	remaining: 9.38s
107:	learn: 0.7437717	total: 1.13s	remaining: 9.35s
108:	learn: 0.7419215	total: 1.14s	remaining: 9.33s
109:	learn: 0.7409948	total: 1.15s	remaining: 9.3s
110:	learn: 0.7401133	total: 1.16s	remaining: 9.27s
111:	learn: 0.7383501	total: 1.17s	remaining: 9.24s
112:	learn: 0.7371072	total: 1.17s	remaining: 9.22s
113:	learn: 0.7355682	total: 1.18s	remaining: 9.19s
114:	learn: 0.7350188	total: 1.19s	remaining: 9.18s
115:	learn: 0.7341165	total: 1.2s	remaining: 9.16s
116:	learn: 0.7331181	total: 1.21s	remaining: 9.13s
117:	learn: 0.7314766	total: 1.22s	remaining: 9.11s
118:	learn: 0.7308948	total: 1.23s	remaining: 9.08s
119:	learn: 0.7301565	total: 1.24s	remaining: 9.06s
120:	learn: 0.7287818	total: 1.24s	remaining: 9.03s
121:	learn: 0.7276991	total: 1.25s	remaining: 9.01s
122:	learn: 0.7269367	total: 1.26s	remaining: 8.99s
123:	learn: 0.7258037	total: 1.27s	remaining: 8.96s
124:	learn: 0.7244741	total: 1.28s	remaining: 8.94s
125:	learn: 0.7236234	total: 1.28s	remaining: 8.92s
126:	learn: 0.7229763	total: 1.29s	remaining: 8.89s
127:	learn: 0.7217185	total: 1.3s	remaining: 8.87s
128:	learn: 0.7207015	total: 1.31s	remaining: 8.85s
129:	learn: 0.7197791	total: 1.32s	remaining: 8.83s
130:	learn: 0.7188422	total: 1.33s	remaining: 8.81s
131:	learn: 0.7175789	total: 1.33s	remaining: 8.78s
132:	learn: 0.7156970	total: 1.34s	remaining: 8.76s
133:	learn: 0.7137807	total: 1.35s	remaining: 8.74s
134:	learn: 0.7122585	total: 1.36s	remaining: 8.72s
135:	learn: 0.7113066	total: 1.37s	remaining: 8.7s
136:	learn: 0.7102287	total: 1.38s	remaining: 8.68s
137:	learn: 0.7093866	total: 1.39s	remaining: 8.66s
138:	learn: 0.7080893	total: 1.4s	remaining: 8.65s
139:	learn: 0.7071540	total: 1.41s	remaining: 8.64s
140:	learn: 0.7061666	total: 1.41s	remaining: 8.62s
141:	learn: 0.7051657	total: 1.42s	remaining: 8.6s
142:	learn: 0.7044674	total: 1.43s	remaining: 8.58s
143:	learn: 0.7035279	total: 1.44s	remaining: 8.56s
144:	learn: 0.7026859	total: 1.45s	remaining: 8.54s
145:	learn: 0.7019229	total: 1.46s	remaining: 8.52s
146:	learn: 0.7013260	total: 1.47s	remaining: 8.51s
147:	learn: 0.7004275	total: 1.48s	remaining: 8.49s
148:	learn: 0.6994288	total: 1.48s	remaining: 8.48s
149:	learn: 0.6987549	total: 1.49s	remaining: 8.46s
150:	learn: 0.6983758	total: 1.5s	remaining: 8.44s
151:	learn: 0.6977482	total: 1.51s	remaining: 8.43s
152:	learn: 0.6973460	total: 1.52s	remaining: 8.41s
153:	learn: 0.6957533	total: 1.53s	remaining: 8.39s
154:	learn: 0.6950848	total: 1.53s	remaining: 8.37s
155:	learn: 0.6945625	total: 1.54s	remaining: 8.35s
156:	learn: 0.6933996	total: 1.55s	remaining: 8.34s
157:	learn: 0.6929128	total: 1.56s	remaining: 8.32s
158:	learn: 0.6922552	total: 1.57s	remaining: 8.3s
159:	learn: 0.6913628	total: 1.58s	remaining: 8.28s
160:	learn: 0.6905188	total: 1.59s	remaining: 8.27s
161:	learn: 0.6891927	total: 1.6s	remaining: 8.26s
162:	learn: 0.6885491	total: 1.61s	remaining: 8.25s
163:	learn: 0.6880982	total: 1.61s	remaining: 8.23s
164:	learn: 0.6877339	total: 1.62s	remaining: 8.21s
165:	learn: 0.6872188	total: 1.63s	remaining: 8.19s
166:	learn: 0.6869799	total: 1.64s	remaining: 8.16s
167:	learn: 0.6864551	total: 1.64s	remaining: 8.15s
168:	learn: 0.6862778	total: 1.65s	remaining: 8.12s
169:	learn: 0.6854347	total: 1.66s	remaining: 8.1s
170:	learn: 0.6843926	total: 1.67s	remaining: 8.09s
171:	learn: 0.6835975	total: 1.68s	remaining: 8.07s
172:	learn: 0.6833222	total: 1.69s	remaining: 8.05s
173:	learn: 0.6829899	total: 1.69s	remaining: 8.04s
174:	learn: 0.6825179	total: 1.7s	remaining: 8.02s
175:	learn: 0.6818484	total: 1.71s	remaining: 8.01s
176:	learn: 0.6809203	total: 1.72s	remaining: 7.99s
177:	learn: 0.6805449	total: 1.73s	remaining: 7.98s
178:	learn: 0.6800088	total: 1.73s	remaining: 7.96s
179:	learn: 0.6797627	total: 1.74s	remaining: 7.94s
180:	learn: 0.6791361	total: 1.75s	remaining: 7.93s
181:	learn: 0.6781247	total: 1.76s	remaining: 7.91s
182:	learn: 0.6773466	total: 1.77s	remaining: 7.9s
183:	learn: 0.6767823	total: 1.78s	remaining: 7.88s
184:	learn: 0.6763856	total: 1.79s	remaining: 7.87s
185:	learn: 0.6754122	total: 1.8s	remaining: 7.86s
186:	learn: 0.6748476	total: 1.81s	remaining: 7.85s
187:	learn: 0.6740096	total: 1.81s	remaining: 7.84s
188:	learn: 0.6733399	total: 1.82s	remaining: 7.83s
189:	learn: 0.6730303	total: 1.83s	remaining: 7.82s
190:	learn: 0.6719603	total: 1.84s	remaining: 7.81s
191:	learn: 0.6709472	total: 1.85s	remaining: 7.8s
192:	learn: 0.6707209	total: 1.86s	remaining: 7.79s
193:	learn: 0.6703309	total: 1.87s	remaining: 7.77s
194:	learn: 0.6698664	total: 1.88s	remaining: 7.76s
195:	learn: 0.6688959	total: 1.89s	remaining: 7.74s
196:	learn: 0.6678436	total: 1.9s	remaining: 7.73s
197:	learn: 0.6671130	total: 1.91s	remaining: 7.72s
198:	learn: 0.6660942	total: 1.91s	remaining: 7.71s
199:	learn: 0.6656011	total: 1.92s	remaining: 7.69s
200:	learn: 0.6644537	total: 1.93s	remaining: 7.68s
201:	learn: 0.6637486	total: 1.94s	remaining: 7.67s
202:	learn: 0.6635674	total: 1.95s	remaining: 7.65s
203:	learn: 0.6630501	total: 1.96s	remaining: 7.64s
204:	learn: 0.6622522	total: 1.97s	remaining: 7.63s
205:	learn: 0.6616175	total: 1.98s	remaining: 7.61s
206:	learn: 0.6606955	total: 1.99s	remaining: 7.61s
207:	learn: 0.6605606	total: 2s	remaining: 7.6s
208:	learn: 0.6592314	total: 2s	remaining: 7.58s
209:	learn: 0.6584405	total: 2.01s	remaining: 7.57s
210:	learn: 0.6577890	total: 2.02s	remaining: 7.56s
211:	learn: 0.6567545	total: 2.03s	remaining: 7.55s
212:	learn: 0.6551177	total: 2.04s	remaining: 7.53s
213:	learn: 0.6546225	total: 2.05s	remaining: 7.52s
214:	learn: 0.6544019	total: 2.06s	remaining: 7.5s
215:	learn: 0.6535283	total: 2.06s	remaining: 7.49s
216:	learn: 0.6525088	total: 2.07s	remaining: 7.48s
217:	learn: 0.6515229	total: 2.08s	remaining: 7.47s
218:	learn: 0.6513304	total: 2.09s	remaining: 7.45s
219:	learn: 0.6505017	total: 2.1s	remaining: 7.44s
220:	learn: 0.6499774	total: 2.11s	remaining: 7.42s
221:	learn: 0.6489217	total: 2.12s	remaining: 7.41s
222:	learn: 0.6482494	total: 2.12s	remaining: 7.4s
223:	learn: 0.6479549	total: 2.13s	remaining: 7.39s
224:	learn: 0.6473044	total: 2.14s	remaining: 7.37s
225:	learn: 0.6470285	total: 2.15s	remaining: 7.36s
226:	learn: 0.6467313	total: 2.16s	remaining: 7.35s
227:	learn: 0.6461697	total: 2.17s	remaining: 7.33s
228:	learn: 0.6454517	total: 2.17s	remaining: 7.32s
229:	learn: 0.6451418	total: 2.18s	remaining: 7.31s
230:	learn: 0.6441852	total: 2.19s	remaining: 7.3s
231:	learn: 0.6440036	total: 2.2s	remaining: 7.29s
232:	learn: 0.6435907	total: 2.21s	remaining: 7.28s
233:	learn: 0.6430771	total: 2.22s	remaining: 7.27s
234:	learn: 0.6427925	total: 2.23s	remaining: 7.26s
235:	learn: 0.6416986	total: 2.24s	remaining: 7.25s
236:	learn: 0.6407916	total: 2.25s	remaining: 7.23s
237:	learn: 0.6400386	total: 2.25s	remaining: 7.22s
238:	learn: 0.6392586	total: 2.26s	remaining: 7.21s
239:	learn: 0.6387270	total: 2.27s	remaining: 7.2s
240:	learn: 0.6385076	total: 2.28s	remaining: 7.19s
241:	learn: 0.6381375	total: 2.29s	remaining: 7.17s
242:	learn: 0.6376707	total: 2.3s	remaining: 7.16s
243:	learn: 0.6373152	total: 2.31s	remaining: 7.15s
244:	learn: 0.6369589	total: 2.31s	remaining: 7.14s
245:	learn: 0.6358958	total: 2.32s	remaining: 7.12s
246:	learn: 0.6352200	total: 2.33s	remaining: 7.11s
247:	learn: 0.6344136	total: 2.34s	remaining: 7.1s
248:	learn: 0.6339102	total: 2.35s	remaining: 7.09s
249:	learn: 0.6331896	total: 2.36s	remaining: 7.08s
250:	learn: 0.6326032	total: 2.37s	remaining: 7.06s
251:	learn: 0.6324804	total: 2.38s	remaining: 7.06s
252:	learn: 0.6319405	total: 2.39s	remaining: 7.05s
253:	learn: 0.6314488	total: 2.4s	remaining: 7.03s
254:	learn: 0.6312510	total: 2.4s	remaining: 7.02s
255:	learn: 0.6302184	total: 2.41s	remaining: 7.01s
256:	learn: 0.6294208	total: 2.42s	remaining: 7s
257:	learn: 0.6287632	total: 2.43s	remaining: 6.99s
258:	learn: 0.6280270	total: 2.44s	remaining: 6.97s
259:	learn: 0.6270663	total: 2.45s	remaining: 6.96s
260:	learn: 0.6262388	total: 2.46s	remaining: 6.95s
261:	learn: 0.6256619	total: 2.46s	remaining: 6.94s
262:	learn: 0.6252060	total: 2.47s	remaining: 6.93s
263:	learn: 0.6245441	total: 2.48s	remaining: 6.92s
264:	learn: 0.6243070	total: 2.49s	remaining: 6.91s
265:	learn: 0.6241432	total: 2.5s	remaining: 6.89s
266:	learn: 0.6234218	total: 2.51s	remaining: 6.88s
267:	learn: 0.6230626	total: 2.52s	remaining: 6.87s
268:	learn: 0.6221102	total: 2.52s	remaining: 6.86s
269:	learn: 0.6209183	total: 2.53s	remaining: 6.85s
270:	learn: 0.6201017	total: 2.54s	remaining: 6.83s
271:	learn: 0.6193909	total: 2.55s	remaining: 6.82s
272:	learn: 0.6192475	total: 2.56s	remaining: 6.81s
273:	learn: 0.6182003	total: 2.57s	remaining: 6.8s
274:	learn: 0.6175751	total: 2.58s	remaining: 6.8s
275:	learn: 0.6171103	total: 2.59s	remaining: 6.79s
276:	learn: 0.6165346	total: 2.6s	remaining: 6.78s
277:	learn: 0.6161491	total: 2.6s	remaining: 6.77s
278:	learn: 0.6152144	total: 2.61s	remaining: 6.76s
279:	learn: 0.6150628	total: 2.62s	remaining: 6.75s
280:	learn: 0.6148109	total: 2.63s	remaining: 6.73s
281:	learn: 0.6144127	total: 2.64s	remaining: 6.72s
282:	learn: 0.6138612	total: 2.65s	remaining: 6.71s
283:	learn: 0.6131011	total: 2.66s	remaining: 6.7s
284:	learn: 0.6124286	total: 2.67s	remaining: 6.69s
285:	learn: 0.6122183	total: 2.67s	remaining: 6.68s
286:	learn: 0.6116817	total: 2.68s	remaining: 6.67s
287:	learn: 0.6109343	total: 2.69s	remaining: 6.65s
288:	learn: 0.6107155	total: 2.7s	remaining: 6.64s
289:	learn: 0.6099686	total: 2.71s	remaining: 6.63s
290:	learn: 0.6094843	total: 2.72s	remaining: 6.62s
291:	learn: 0.6090361	total: 2.73s	remaining: 6.61s
292:	learn: 0.6084801	total: 2.73s	remaining: 6.6s
293:	learn: 0.6081563	total: 2.75s	remaining: 6.59s
294:	learn: 0.6080218	total: 2.75s	remaining: 6.58s
295:	learn: 0.6075235	total: 2.76s	remaining: 6.57s
296:	learn: 0.6066732	total: 2.77s	remaining: 6.57s
297:	learn: 0.6059925	total: 2.78s	remaining: 6.56s
298:	learn: 0.6048658	total: 2.79s	remaining: 6.55s
299:	learn: 0.6045467	total: 2.8s	remaining: 6.54s
300:	learn: 0.6040303	total: 2.81s	remaining: 6.53s
301:	learn: 0.6034421	total: 2.82s	remaining: 6.52s
302:	learn: 0.6032599	total: 2.83s	remaining: 6.51s
303:	learn: 0.6028639	total: 2.84s	remaining: 6.5s
304:	learn: 0.6020486	total: 2.85s	remaining: 6.49s
305:	learn: 0.6012677	total: 2.85s	remaining: 6.47s
306:	learn: 0.6008134	total: 2.86s	remaining: 6.46s
307:	learn: 0.5997457	total: 2.87s	remaining: 6.45s
308:	learn: 0.5989940	total: 2.88s	remaining: 6.45s
309:	learn: 0.5989001	total: 2.89s	remaining: 6.43s
310:	learn: 0.5982289	total: 2.9s	remaining: 6.42s
311:	learn: 0.5980650	total: 2.91s	remaining: 6.41s
312:	learn: 0.5971499	total: 2.92s	remaining: 6.4s
313:	learn: 0.5968312	total: 2.93s	remaining: 6.39s
314:	learn: 0.5963258	total: 2.93s	remaining: 6.38s
315:	learn: 0.5957757	total: 2.94s	remaining: 6.37s
316:	learn: 0.5952828	total: 2.95s	remaining: 6.36s
317:	learn: 0.5946359	total: 2.96s	remaining: 6.35s
318:	learn: 0.5940712	total: 2.97s	remaining: 6.34s
319:	learn: 0.5934361	total: 2.98s	remaining: 6.33s
320:	learn: 0.5932696	total: 2.99s	remaining: 6.32s
321:	learn: 0.5920226	total: 3s	remaining: 6.31s
322:	learn: 0.5913360	total: 3.01s	remaining: 6.3s
323:	learn: 0.5911230	total: 3.01s	remaining: 6.29s
324:	learn: 0.5903178	total: 3.02s	remaining: 6.28s
325:	learn: 0.5900576	total: 3.03s	remaining: 6.27s
326:	learn: 0.5899134	total: 3.04s	remaining: 6.26s
327:	learn: 0.5886299	total: 3.05s	remaining: 6.25s
328:	learn: 0.5881614	total: 3.06s	remaining: 6.24s
329:	learn: 0.5878837	total: 3.06s	remaining: 6.22s
330:	learn: 0.5867578	total: 3.07s	remaining: 6.21s
331:	learn: 0.5862393	total: 3.08s	remaining: 6.2s
332:	learn: 0.5856013	total: 3.09s	remaining: 6.19s
333:	learn: 0.5850880	total: 3.1s	remaining: 6.18s
334:	learn: 0.5842308	total: 3.11s	remaining: 6.17s
335:	learn: 0.5834531	total: 3.12s	remaining: 6.16s
336:	learn: 0.5826950	total: 3.12s	remaining: 6.15s
337:	learn: 0.5825005	total: 3.13s	remaining: 6.14s
338:	learn: 0.5821040	total: 3.14s	remaining: 6.13s
339:	learn: 0.5813547	total: 3.15s	remaining: 6.12s
340:	learn: 0.5806776	total: 3.16s	remaining: 6.11s
341:	learn: 0.5799358	total: 3.17s	remaining: 6.1s
342:	learn: 0.5793129	total: 3.18s	remaining: 6.09s
343:	learn: 0.5783922	total: 3.19s	remaining: 6.08s
344:	learn: 0.5780726	total: 3.2s	remaining: 6.07s
345:	learn: 0.5775775	total: 3.21s	remaining: 6.06s
346:	learn: 0.5775030	total: 3.21s	remaining: 6.05s
347:	learn: 0.5771250	total: 3.22s	remaining: 6.04s
348:	learn: 0.5766199	total: 3.23s	remaining: 6.03s
349:	learn: 0.5763879	total: 3.24s	remaining: 6.02s
350:	learn: 0.5758362	total: 3.25s	remaining: 6s
351:	learn: 0.5757173	total: 3.26s	remaining: 5.99s
352:	learn: 0.5749656	total: 3.26s	remaining: 5.98s
353:	learn: 0.5744613	total: 3.27s	remaining: 5.97s
354:	learn: 0.5740848	total: 3.28s	remaining: 5.96s
355:	learn: 0.5733969	total: 3.29s	remaining: 5.95s
356:	learn: 0.5729858	total: 3.3s	remaining: 5.94s
357:	learn: 0.5722242	total: 3.31s	remaining: 5.93s
358:	learn: 0.5719359	total: 3.31s	remaining: 5.92s
359:	learn: 0.5714055	total: 3.32s	remaining: 5.91s
360:	learn: 0.5713429	total: 3.33s	remaining: 5.9s
361:	learn: 0.5707038	total: 3.34s	remaining: 5.89s
362:	learn: 0.5697465	total: 3.35s	remaining: 5.88s
363:	learn: 0.5689653	total: 3.36s	remaining: 5.87s
364:	learn: 0.5679915	total: 3.37s	remaining: 5.86s
365:	learn: 0.5672978	total: 3.38s	remaining: 5.85s
366:	learn: 0.5665002	total: 3.39s	remaining: 5.84s
367:	learn: 0.5658477	total: 3.4s	remaining: 5.83s
368:	learn: 0.5649246	total: 3.4s	remaining: 5.82s
369:	learn: 0.5641415	total: 3.41s	remaining: 5.81s
370:	learn: 0.5634401	total: 3.42s	remaining: 5.8s
371:	learn: 0.5623776	total: 3.43s	remaining: 5.79s
372:	learn: 0.5621167	total: 3.44s	remaining: 5.78s
373:	learn: 0.5614852	total: 3.45s	remaining: 5.77s
374:	learn: 0.5611128	total: 3.46s	remaining: 5.76s
375:	learn: 0.5607857	total: 3.46s	remaining: 5.75s
376:	learn: 0.5600001	total: 3.47s	remaining: 5.74s
377:	learn: 0.5592140	total: 3.48s	remaining: 5.73s
378:	learn: 0.5588918	total: 3.49s	remaining: 5.72s
379:	learn: 0.5581588	total: 3.5s	remaining: 5.71s
380:	learn: 0.5574779	total: 3.51s	remaining: 5.7s
381:	learn: 0.5571169	total: 3.52s	remaining: 5.69s
382:	learn: 0.5564728	total: 3.53s	remaining: 5.68s
383:	learn: 0.5560468	total: 3.54s	remaining: 5.67s
384:	learn: 0.5553551	total: 3.54s	remaining: 5.66s
385:	learn: 0.5546423	total: 3.55s	remaining: 5.65s
386:	learn: 0.5545394	total: 3.56s	remaining: 5.64s
387:	learn: 0.5535654	total: 3.57s	remaining: 5.63s
388:	learn: 0.5528708	total: 3.58s	remaining: 5.63s
389:	learn: 0.5525627	total: 3.6s	remaining: 5.62s
390:	learn: 0.5514080	total: 3.6s	remaining: 5.61s
391:	learn: 0.5511985	total: 3.61s	remaining: 5.6s
392:	learn: 0.5511267	total: 3.62s	remaining: 5.59s
393:	learn: 0.5503489	total: 3.63s	remaining: 5.59s
394:	learn: 0.5500260	total: 3.64s	remaining: 5.58s
395:	learn: 0.5493351	total: 3.65s	remaining: 5.57s
396:	learn: 0.5489435	total: 3.66s	remaining: 5.56s
397:	learn: 0.5482200	total: 3.67s	remaining: 5.55s
398:	learn: 0.5475617	total: 3.68s	remaining: 5.55s
399:	learn: 0.5469754	total: 3.69s	remaining: 5.54s
400:	learn: 0.5462269	total: 3.7s	remaining: 5.53s
401:	learn: 0.5456011	total: 3.71s	remaining: 5.52s
402:	learn: 0.5450414	total: 3.72s	remaining: 5.51s
403:	learn: 0.5447492	total: 3.73s	remaining: 5.5s
404:	learn: 0.5440749	total: 3.74s	remaining: 5.49s
405:	learn: 0.5435660	total: 3.75s	remaining: 5.48s
406:	learn: 0.5427900	total: 3.76s	remaining: 5.47s
407:	learn: 0.5417441	total: 3.77s	remaining: 5.46s
408:	learn: 0.5410957	total: 3.77s	remaining: 5.45s
409:	learn: 0.5406065	total: 3.78s	remaining: 5.45s
410:	learn: 0.5404060	total: 3.79s	remaining: 5.44s
411:	learn: 0.5397333	total: 3.8s	remaining: 5.43s
412:	learn: 0.5392882	total: 3.81s	remaining: 5.42s
413:	learn: 0.5383766	total: 3.82s	remaining: 5.41s
414:	learn: 0.5377791	total: 3.83s	remaining: 5.4s
415:	learn: 0.5371861	total: 3.84s	remaining: 5.39s
416:	learn: 0.5365999	total: 3.85s	remaining: 5.38s
417:	learn: 0.5362459	total: 3.85s	remaining: 5.37s
418:	learn: 0.5355399	total: 3.87s	remaining: 5.36s
419:	learn: 0.5351093	total: 3.87s	remaining: 5.35s
420:	learn: 0.5341131	total: 3.88s	remaining: 5.34s
421:	learn: 0.5335338	total: 3.89s	remaining: 5.33s
422:	learn: 0.5330742	total: 3.9s	remaining: 5.32s
423:	learn: 0.5326561	total: 3.91s	remaining: 5.31s
424:	learn: 0.5323020	total: 3.92s	remaining: 5.3s
425:	learn: 0.5316538	total: 3.93s	remaining: 5.29s
426:	learn: 0.5312457	total: 3.93s	remaining: 5.28s
427:	learn: 0.5307360	total: 3.94s	remaining: 5.27s
428:	learn: 0.5301479	total: 3.95s	remaining: 5.26s
429:	learn: 0.5295698	total: 3.96s	remaining: 5.25s
430:	learn: 0.5290427	total: 3.97s	remaining: 5.24s
431:	learn: 0.5281144	total: 3.98s	remaining: 5.23s
432:	learn: 0.5273083	total: 3.99s	remaining: 5.22s
433:	learn: 0.5267205	total: 4s	remaining: 5.21s
434:	learn: 0.5256893	total: 4s	remaining: 5.2s
435:	learn: 0.5253536	total: 4.01s	remaining: 5.19s
436:	learn: 0.5246517	total: 4.02s	remaining: 5.18s
437:	learn: 0.5237119	total: 4.03s	remaining: 5.17s
438:	learn: 0.5230593	total: 4.04s	remaining: 5.16s
439:	learn: 0.5225324	total: 4.05s	remaining: 5.15s
440:	learn: 0.5219825	total: 4.05s	remaining: 5.14s
441:	learn: 0.5217943	total: 4.06s	remaining: 5.13s
442:	learn: 0.5211542	total: 4.07s	remaining: 5.12s
443:	learn: 0.5206658	total: 4.08s	remaining: 5.11s
444:	learn: 0.5204323	total: 4.09s	remaining: 5.1s
445:	learn: 0.5198973	total: 4.1s	remaining: 5.09s
446:	learn: 0.5195815	total: 4.11s	remaining: 5.08s
447:	learn: 0.5192476	total: 4.12s	remaining: 5.07s
448:	learn: 0.5187147	total: 4.12s	remaining: 5.06s
449:	learn: 0.5183929	total: 4.13s	remaining: 5.05s
450:	learn: 0.5181662	total: 4.14s	remaining: 5.04s
451:	learn: 0.5175579	total: 4.15s	remaining: 5.03s
452:	learn: 0.5172561	total: 4.16s	remaining: 5.02s
453:	learn: 0.5167894	total: 4.17s	remaining: 5.01s
454:	learn: 0.5165202	total: 4.18s	remaining: 5s
455:	learn: 0.5156852	total: 4.18s	remaining: 4.99s
456:	learn: 0.5151862	total: 4.19s	remaining: 4.98s
457:	learn: 0.5142967	total: 4.2s	remaining: 4.97s
458:	learn: 0.5138300	total: 4.21s	remaining: 4.96s
459:	learn: 0.5134657	total: 4.22s	remaining: 4.95s
460:	learn: 0.5131495	total: 4.23s	remaining: 4.94s
461:	learn: 0.5127743	total: 4.24s	remaining: 4.93s
462:	learn: 0.5125532	total: 4.25s	remaining: 4.92s
463:	learn: 0.5120490	total: 4.25s	remaining: 4.91s
464:	learn: 0.5116764	total: 4.26s	remaining: 4.9s
465:	learn: 0.5115397	total: 4.27s	remaining: 4.89s
466:	learn: 0.5108901	total: 4.28s	remaining: 4.88s
467:	learn: 0.5107070	total: 4.29s	remaining: 4.88s
468:	learn: 0.5104565	total: 4.3s	remaining: 4.87s
469:	learn: 0.5095444	total: 4.3s	remaining: 4.85s
470:	learn: 0.5093182	total: 4.31s	remaining: 4.84s
471:	learn: 0.5086129	total: 4.32s	remaining: 4.83s
472:	learn: 0.5084896	total: 4.33s	remaining: 4.83s
473:	learn: 0.5080841	total: 4.34s	remaining: 4.82s
474:	learn: 0.5077696	total: 4.35s	remaining: 4.81s
475:	learn: 0.5069961	total: 4.36s	remaining: 4.79s
476:	learn: 0.5065801	total: 4.37s	remaining: 4.79s
477:	learn: 0.5063879	total: 4.38s	remaining: 4.78s
478:	learn: 0.5059936	total: 4.38s	remaining: 4.77s
479:	learn: 0.5054010	total: 4.39s	remaining: 4.76s
480:	learn: 0.5048479	total: 4.4s	remaining: 4.75s
481:	learn: 0.5043737	total: 4.41s	remaining: 4.74s
482:	learn: 0.5042897	total: 4.42s	remaining: 4.73s
483:	learn: 0.5042001	total: 4.43s	remaining: 4.72s
484:	learn: 0.5037486	total: 4.44s	remaining: 4.71s
485:	learn: 0.5033463	total: 4.44s	remaining: 4.7s
486:	learn: 0.5028856	total: 4.45s	remaining: 4.69s
487:	learn: 0.5024053	total: 4.46s	remaining: 4.68s
488:	learn: 0.5019062	total: 4.47s	remaining: 4.67s
489:	learn: 0.5012880	total: 4.48s	remaining: 4.66s
490:	learn: 0.5009530	total: 4.49s	remaining: 4.65s
491:	learn: 0.5004142	total: 4.49s	remaining: 4.64s
492:	learn: 0.4997081	total: 4.5s	remaining: 4.63s
493:	learn: 0.4991491	total: 4.51s	remaining: 4.62s
494:	learn: 0.4987222	total: 4.52s	remaining: 4.61s
495:	learn: 0.4982031	total: 4.53s	remaining: 4.6s
496:	learn: 0.4976786	total: 4.54s	remaining: 4.59s
497:	learn: 0.4969792	total: 4.55s	remaining: 4.58s
498:	learn: 0.4966860	total: 4.56s	remaining: 4.57s
499:	learn: 0.4966342	total: 4.57s	remaining: 4.57s
500:	learn: 0.4965745	total: 4.58s	remaining: 4.56s
501:	learn: 0.4965352	total: 4.59s	remaining: 4.55s
502:	learn: 0.4956031	total: 4.6s	remaining: 4.54s
503:	learn: 0.4951673	total: 4.61s	remaining: 4.53s
504:	learn: 0.4949997	total: 4.61s	remaining: 4.52s
505:	learn: 0.4945065	total: 4.62s	remaining: 4.51s
506:	learn: 0.4943521	total: 4.63s	remaining: 4.5s
507:	learn: 0.4938577	total: 4.64s	remaining: 4.5s
508:	learn: 0.4933904	total: 4.65s	remaining: 4.49s
509:	learn: 0.4929696	total: 4.66s	remaining: 4.48s
510:	learn: 0.4924054	total: 4.67s	remaining: 4.47s
511:	learn: 0.4921449	total: 4.68s	remaining: 4.46s
512:	learn: 0.4919888	total: 4.69s	remaining: 4.45s
513:	learn: 0.4919275	total: 4.7s	remaining: 4.44s
514:	learn: 0.4913437	total: 4.7s	remaining: 4.43s
515:	learn: 0.4909638	total: 4.71s	remaining: 4.42s
516:	learn: 0.4903935	total: 4.72s	remaining: 4.41s
517:	learn: 0.4899476	total: 4.73s	remaining: 4.4s
518:	learn: 0.4893635	total: 4.74s	remaining: 4.39s
519:	learn: 0.4889604	total: 4.75s	remaining: 4.38s
520:	learn: 0.4881536	total: 4.76s	remaining: 4.37s
521:	learn: 0.4875722	total: 4.77s	remaining: 4.37s
522:	learn: 0.4871427	total: 4.78s	remaining: 4.36s
523:	learn: 0.4866337	total: 4.79s	remaining: 4.35s
524:	learn: 0.4860427	total: 4.8s	remaining: 4.34s
525:	learn: 0.4854887	total: 4.81s	remaining: 4.33s
526:	learn: 0.4850130	total: 4.82s	remaining: 4.32s
527:	learn: 0.4848630	total: 4.83s	remaining: 4.32s
528:	learn: 0.4847015	total: 4.84s	remaining: 4.3s
529:	learn: 0.4843266	total: 4.84s	remaining: 4.29s
530:	learn: 0.4837687	total: 4.85s	remaining: 4.29s
531:	learn: 0.4835083	total: 4.86s	remaining: 4.28s
532:	learn: 0.4831135	total: 4.87s	remaining: 4.27s
533:	learn: 0.4826334	total: 4.88s	remaining: 4.26s
534:	learn: 0.4822224	total: 4.89s	remaining: 4.25s
535:	learn: 0.4816676	total: 4.9s	remaining: 4.24s
536:	learn: 0.4811267	total: 4.91s	remaining: 4.23s
537:	learn: 0.4803596	total: 4.91s	remaining: 4.22s
538:	learn: 0.4800116	total: 4.92s	remaining: 4.21s
539:	learn: 0.4799573	total: 4.93s	remaining: 4.2s
540:	learn: 0.4798071	total: 4.94s	remaining: 4.19s
541:	learn: 0.4797662	total: 4.95s	remaining: 4.18s
542:	learn: 0.4793019	total: 4.96s	remaining: 4.17s
543:	learn: 0.4789995	total: 4.96s	remaining: 4.16s
544:	learn: 0.4782319	total: 4.97s	remaining: 4.15s
545:	learn: 0.4774372	total: 4.98s	remaining: 4.14s
546:	learn: 0.4770149	total: 4.99s	remaining: 4.13s
547:	learn: 0.4765111	total: 5s	remaining: 4.13s
548:	learn: 0.4760720	total: 5.01s	remaining: 4.12s
549:	learn: 0.4750344	total: 5.02s	remaining: 4.11s
550:	learn: 0.4745495	total: 5.03s	remaining: 4.1s
551:	learn: 0.4741778	total: 5.04s	remaining: 4.09s
552:	learn: 0.4738215	total: 5.04s	remaining: 4.08s
553:	learn: 0.4734213	total: 5.05s	remaining: 4.07s
554:	learn: 0.4728759	total: 5.06s	remaining: 4.06s
555:	learn: 0.4724093	total: 5.07s	remaining: 4.05s
556:	learn: 0.4719460	total: 5.08s	remaining: 4.04s
557:	learn: 0.4715014	total: 5.08s	remaining: 4.03s
558:	learn: 0.4711922	total: 5.09s	remaining: 4.02s
559:	learn: 0.4707461	total: 5.1s	remaining: 4.01s
560:	learn: 0.4704898	total: 5.11s	remaining: 4s
561:	learn: 0.4703531	total: 5.12s	remaining: 3.99s
562:	learn: 0.4702130	total: 5.13s	remaining: 3.98s
563:	learn: 0.4697094	total: 5.14s	remaining: 3.97s
564:	learn: 0.4693827	total: 5.15s	remaining: 3.96s
565:	learn: 0.4691106	total: 5.16s	remaining: 3.95s
566:	learn: 0.4690066	total: 5.17s	remaining: 3.94s
567:	learn: 0.4688329	total: 5.17s	remaining: 3.94s
568:	learn: 0.4685137	total: 5.18s	remaining: 3.93s
569:	learn: 0.4679780	total: 5.19s	remaining: 3.92s
570:	learn: 0.4679418	total: 5.2s	remaining: 3.91s
571:	learn: 0.4676925	total: 5.21s	remaining: 3.9s
572:	learn: 0.4672320	total: 5.22s	remaining: 3.89s
573:	learn: 0.4668136	total: 5.23s	remaining: 3.88s
574:	learn: 0.4663503	total: 5.23s	remaining: 3.87s
575:	learn: 0.4662997	total: 5.24s	remaining: 3.86s
576:	learn: 0.4659806	total: 5.25s	remaining: 3.85s
577:	learn: 0.4657717	total: 5.26s	remaining: 3.84s
578:	learn: 0.4654588	total: 5.27s	remaining: 3.83s
579:	learn: 0.4654121	total: 5.28s	remaining: 3.82s
580:	learn: 0.4652475	total: 5.28s	remaining: 3.81s
581:	learn: 0.4646635	total: 5.29s	remaining: 3.8s
582:	learn: 0.4645486	total: 5.3s	remaining: 3.79s
583:	learn: 0.4644542	total: 5.31s	remaining: 3.78s
584:	learn: 0.4639304	total: 5.32s	remaining: 3.77s
585:	learn: 0.4634513	total: 5.33s	remaining: 3.76s
586:	learn: 0.4634047	total: 5.33s	remaining: 3.75s
587:	learn: 0.4633472	total: 5.34s	remaining: 3.74s
588:	learn: 0.4628894	total: 5.35s	remaining: 3.73s
589:	learn: 0.4625469	total: 5.36s	remaining: 3.73s
590:	learn: 0.4624341	total: 5.37s	remaining: 3.72s
591:	learn: 0.4623161	total: 5.38s	remaining: 3.71s
592:	learn: 0.4619491	total: 5.39s	remaining: 3.7s
593:	learn: 0.4615210	total: 5.4s	remaining: 3.69s
594:	learn: 0.4613002	total: 5.4s	remaining: 3.68s
595:	learn: 0.4611374	total: 5.41s	remaining: 3.67s
596:	learn: 0.4610969	total: 5.42s	remaining: 3.66s
597:	learn: 0.4607229	total: 5.43s	remaining: 3.65s
598:	learn: 0.4600619	total: 5.44s	remaining: 3.64s
599:	learn: 0.4596487	total: 5.45s	remaining: 3.63s
600:	learn: 0.4596063	total: 5.45s	remaining: 3.62s
601:	learn: 0.4595169	total: 5.46s	remaining: 3.61s
602:	learn: 0.4590338	total: 5.47s	remaining: 3.6s
603:	learn: 0.4586930	total: 5.48s	remaining: 3.59s
604:	learn: 0.4586483	total: 5.49s	remaining: 3.58s
605:	learn: 0.4582342	total: 5.5s	remaining: 3.58s
606:	learn: 0.4577017	total: 5.51s	remaining: 3.56s
607:	learn: 0.4570038	total: 5.52s	remaining: 3.56s
608:	learn: 0.4568508	total: 5.53s	remaining: 3.55s
609:	learn: 0.4563831	total: 5.53s	remaining: 3.54s
610:	learn: 0.4560270	total: 5.54s	remaining: 3.53s
611:	learn: 0.4557506	total: 5.55s	remaining: 3.52s
612:	learn: 0.4555584	total: 5.56s	remaining: 3.51s
613:	learn: 0.4554633	total: 5.57s	remaining: 3.5s
614:	learn: 0.4548328	total: 5.58s	remaining: 3.49s
615:	learn: 0.4543931	total: 5.59s	remaining: 3.48s
616:	learn: 0.4534676	total: 5.6s	remaining: 3.47s
617:	learn: 0.4531260	total: 5.61s	remaining: 3.46s
618:	learn: 0.4525905	total: 5.61s	remaining: 3.46s
619:	learn: 0.4524391	total: 5.62s	remaining: 3.45s
620:	learn: 0.4522149	total: 5.63s	remaining: 3.44s
621:	learn: 0.4520028	total: 5.64s	remaining: 3.43s
622:	learn: 0.4515927	total: 5.65s	remaining: 3.42s
623:	learn: 0.4515306	total: 5.66s	remaining: 3.41s
624:	learn: 0.4510962	total: 5.67s	remaining: 3.4s
625:	learn: 0.4509871	total: 5.68s	remaining: 3.39s
626:	learn: 0.4507215	total: 5.68s	remaining: 3.38s
627:	learn: 0.4506893	total: 5.69s	remaining: 3.37s
628:	learn: 0.4502501	total: 5.7s	remaining: 3.36s
629:	learn: 0.4500984	total: 5.71s	remaining: 3.35s
630:	learn: 0.4500369	total: 5.72s	remaining: 3.34s
631:	learn: 0.4498162	total: 5.72s	remaining: 3.33s
632:	learn: 0.4493711	total: 5.73s	remaining: 3.32s
633:	learn: 0.4493486	total: 5.74s	remaining: 3.31s
634:	learn: 0.4484798	total: 5.75s	remaining: 3.31s
635:	learn: 0.4483633	total: 5.76s	remaining: 3.3s
636:	learn: 0.4483137	total: 5.77s	remaining: 3.29s
637:	learn: 0.4482400	total: 5.78s	remaining: 3.28s
638:	learn: 0.4482190	total: 5.79s	remaining: 3.27s
639:	learn: 0.4477528	total: 5.8s	remaining: 3.26s
640:	learn: 0.4475320	total: 5.81s	remaining: 3.25s
641:	learn: 0.4474157	total: 5.82s	remaining: 3.24s
642:	learn: 0.4470214	total: 5.82s	remaining: 3.23s
643:	learn: 0.4466874	total: 5.83s	remaining: 3.22s
644:	learn: 0.4461936	total: 5.84s	remaining: 3.21s
645:	learn: 0.4460785	total: 5.85s	remaining: 3.21s
646:	learn: 0.4457503	total: 5.86s	remaining: 3.2s
647:	learn: 0.4454490	total: 5.87s	remaining: 3.19s
648:	learn: 0.4450323	total: 5.88s	remaining: 3.18s
649:	learn: 0.4448053	total: 5.88s	remaining: 3.17s
650:	learn: 0.4441291	total: 5.89s	remaining: 3.16s
651:	learn: 0.4438436	total: 5.9s	remaining: 3.15s
652:	learn: 0.4434739	total: 5.91s	remaining: 3.14s
653:	learn: 0.4430134	total: 5.92s	remaining: 3.13s
654:	learn: 0.4427834	total: 5.93s	remaining: 3.12s
655:	learn: 0.4426917	total: 5.94s	remaining: 3.11s
656:	learn: 0.4422552	total: 5.94s	remaining: 3.1s
657:	learn: 0.4417979	total: 5.96s	remaining: 3.1s
658:	learn: 0.4416304	total: 5.96s	remaining: 3.09s
659:	learn: 0.4411024	total: 5.97s	remaining: 3.08s
660:	learn: 0.4410318	total: 5.98s	remaining: 3.07s
661:	learn: 0.4408703	total: 5.99s	remaining: 3.06s
662:	learn: 0.4406867	total: 6s	remaining: 3.05s
663:	learn: 0.4403662	total: 6.01s	remaining: 3.04s
664:	learn: 0.4402024	total: 6.01s	remaining: 3.03s
665:	learn: 0.4397959	total: 6.02s	remaining: 3.02s
666:	learn: 0.4397449	total: 6.03s	remaining: 3.01s
667:	learn: 0.4393046	total: 6.04s	remaining: 3s
668:	learn: 0.4388819	total: 6.05s	remaining: 2.99s
669:	learn: 0.4383218	total: 6.06s	remaining: 2.98s
670:	learn: 0.4380107	total: 6.07s	remaining: 2.98s
671:	learn: 0.4378167	total: 6.08s	remaining: 2.97s
672:	learn: 0.4376983	total: 6.08s	remaining: 2.96s
673:	learn: 0.4372483	total: 6.09s	remaining: 2.95s
674:	learn: 0.4369046	total: 6.1s	remaining: 2.94s
675:	learn: 0.4364904	total: 6.11s	remaining: 2.93s
676:	learn: 0.4360844	total: 6.12s	remaining: 2.92s
677:	learn: 0.4358434	total: 6.13s	remaining: 2.91s
678:	learn: 0.4357327	total: 6.14s	remaining: 2.9s
679:	learn: 0.4353142	total: 6.15s	remaining: 2.89s
680:	learn: 0.4351831	total: 6.16s	remaining: 2.88s
681:	learn: 0.4351357	total: 6.17s	remaining: 2.88s
682:	learn: 0.4346652	total: 6.17s	remaining: 2.87s
683:	learn: 0.4343972	total: 6.18s	remaining: 2.86s
684:	learn: 0.4340819	total: 6.19s	remaining: 2.85s
685:	learn: 0.4339848	total: 6.2s	remaining: 2.84s
686:	learn: 0.4335208	total: 6.21s	remaining: 2.83s
687:	learn: 0.4335033	total: 6.22s	remaining: 2.82s
688:	learn: 0.4330250	total: 6.23s	remaining: 2.81s
689:	learn: 0.4326587	total: 6.24s	remaining: 2.8s
690:	learn: 0.4325160	total: 6.25s	remaining: 2.79s
691:	learn: 0.4323185	total: 6.25s	remaining: 2.78s
692:	learn: 0.4320122	total: 6.26s	remaining: 2.77s
693:	learn: 0.4317413	total: 6.27s	remaining: 2.77s
694:	learn: 0.4315810	total: 6.28s	remaining: 2.76s
695:	learn: 0.4312061	total: 6.29s	remaining: 2.75s
696:	learn: 0.4311651	total: 6.3s	remaining: 2.74s
697:	learn: 0.4308118	total: 6.31s	remaining: 2.73s
698:	learn: 0.4306985	total: 6.32s	remaining: 2.72s
699:	learn: 0.4305162	total: 6.33s	remaining: 2.71s
700:	learn: 0.4301742	total: 6.33s	remaining: 2.7s
701:	learn: 0.4298377	total: 6.34s	remaining: 2.69s
702:	learn: 0.4293014	total: 6.35s	remaining: 2.68s
703:	learn: 0.4290563	total: 6.36s	remaining: 2.67s
704:	learn: 0.4285803	total: 6.37s	remaining: 2.67s
705:	learn: 0.4283608	total: 6.38s	remaining: 2.66s
706:	learn: 0.4280074	total: 6.39s	remaining: 2.65s
707:	learn: 0.4276370	total: 6.4s	remaining: 2.64s
708:	learn: 0.4273539	total: 6.41s	remaining: 2.63s
709:	learn: 0.4272813	total: 6.42s	remaining: 2.62s
710:	learn: 0.4271799	total: 6.43s	remaining: 2.61s
711:	learn: 0.4267450	total: 6.44s	remaining: 2.6s
712:	learn: 0.4266634	total: 6.45s	remaining: 2.59s
713:	learn: 0.4265923	total: 6.45s	remaining: 2.58s
714:	learn: 0.4262512	total: 6.46s	remaining: 2.58s
715:	learn: 0.4260792	total: 6.47s	remaining: 2.57s
716:	learn: 0.4259784	total: 6.48s	remaining: 2.56s
717:	learn: 0.4255240	total: 6.49s	remaining: 2.55s
718:	learn: 0.4252893	total: 6.5s	remaining: 2.54s
719:	learn: 0.4251511	total: 6.5s	remaining: 2.53s
720:	learn: 0.4248494	total: 6.51s	remaining: 2.52s
721:	learn: 0.4246293	total: 6.52s	remaining: 2.51s
722:	learn: 0.4243620	total: 6.53s	remaining: 2.5s
723:	learn: 0.4239481	total: 6.54s	remaining: 2.49s
724:	learn: 0.4239143	total: 6.55s	remaining: 2.48s
725:	learn: 0.4235251	total: 6.56s	remaining: 2.48s
726:	learn: 0.4235052	total: 6.57s	remaining: 2.47s
727:	learn: 0.4231476	total: 6.58s	remaining: 2.46s
728:	learn: 0.4230260	total: 6.59s	remaining: 2.45s
729:	learn: 0.4227386	total: 6.59s	remaining: 2.44s
730:	learn: 0.4224769	total: 6.6s	remaining: 2.43s
731:	learn: 0.4219318	total: 6.61s	remaining: 2.42s
732:	learn: 0.4216611	total: 6.62s	remaining: 2.41s
733:	learn: 0.4213517	total: 6.63s	remaining: 2.4s
734:	learn: 0.4208475	total: 6.64s	remaining: 2.39s
735:	learn: 0.4207438	total: 6.65s	remaining: 2.38s
736:	learn: 0.4203080	total: 6.65s	remaining: 2.37s
737:	learn: 0.4197556	total: 6.66s	remaining: 2.37s
738:	learn: 0.4196778	total: 6.67s	remaining: 2.35s
739:	learn: 0.4193275	total: 6.68s	remaining: 2.35s
740:	learn: 0.4192715	total: 6.69s	remaining: 2.34s
741:	learn: 0.4191401	total: 6.69s	remaining: 2.33s
742:	learn: 0.4189095	total: 6.7s	remaining: 2.32s
743:	learn: 0.4187817	total: 6.71s	remaining: 2.31s
744:	learn: 0.4183057	total: 6.72s	remaining: 2.3s
745:	learn: 0.4181619	total: 6.73s	remaining: 2.29s
746:	learn: 0.4180108	total: 6.74s	remaining: 2.28s
747:	learn: 0.4179847	total: 6.75s	remaining: 2.27s
748:	learn: 0.4179653	total: 6.76s	remaining: 2.26s
749:	learn: 0.4175881	total: 6.76s	remaining: 2.25s
750:	learn: 0.4169682	total: 6.77s	remaining: 2.25s
751:	learn: 0.4166755	total: 6.78s	remaining: 2.24s
752:	learn: 0.4164738	total: 6.79s	remaining: 2.23s
753:	learn: 0.4160905	total: 6.8s	remaining: 2.22s
754:	learn: 0.4157598	total: 6.81s	remaining: 2.21s
755:	learn: 0.4156041	total: 6.82s	remaining: 2.2s
756:	learn: 0.4150718	total: 6.83s	remaining: 2.19s
757:	learn: 0.4148743	total: 6.83s	remaining: 2.18s
758:	learn: 0.4145444	total: 6.84s	remaining: 2.17s
759:	learn: 0.4140837	total: 6.85s	remaining: 2.16s
760:	learn: 0.4138667	total: 6.86s	remaining: 2.15s
761:	learn: 0.4135475	total: 6.87s	remaining: 2.15s
762:	learn: 0.4134828	total: 6.88s	remaining: 2.14s
763:	learn: 0.4130441	total: 6.88s	remaining: 2.13s
764:	learn: 0.4127324	total: 6.89s	remaining: 2.12s
765:	learn: 0.4123391	total: 6.9s	remaining: 2.11s
766:	learn: 0.4121664	total: 6.91s	remaining: 2.1s
767:	learn: 0.4118238	total: 6.92s	remaining: 2.09s
768:	learn: 0.4112535	total: 6.93s	remaining: 2.08s
769:	learn: 0.4111060	total: 6.94s	remaining: 2.07s
770:	learn: 0.4108241	total: 6.95s	remaining: 2.06s
771:	learn: 0.4105582	total: 6.96s	remaining: 2.05s
772:	learn: 0.4100413	total: 6.97s	remaining: 2.04s
773:	learn: 0.4095548	total: 6.97s	remaining: 2.04s
774:	learn: 0.4095340	total: 6.98s	remaining: 2.03s
775:	learn: 0.4093985	total: 6.99s	remaining: 2.02s
776:	learn: 0.4089372	total: 7s	remaining: 2.01s
777:	learn: 0.4089058	total: 7.01s	remaining: 2s
778:	learn: 0.4085733	total: 7.02s	remaining: 1.99s
779:	learn: 0.4084440	total: 7.03s	remaining: 1.98s
780:	learn: 0.4080624	total: 7.03s	remaining: 1.97s
781:	learn: 0.4075909	total: 7.04s	remaining: 1.96s
782:	learn: 0.4075123	total: 7.05s	remaining: 1.95s
783:	learn: 0.4072511	total: 7.06s	remaining: 1.94s
784:	learn: 0.4071236	total: 7.07s	remaining: 1.94s
785:	learn: 0.4067208	total: 7.07s	remaining: 1.93s
786:	learn: 0.4064287	total: 7.08s	remaining: 1.92s
787:	learn: 0.4060867	total: 7.09s	remaining: 1.91s
788:	learn: 0.4058277	total: 7.1s	remaining: 1.9s
789:	learn: 0.4054365	total: 7.11s	remaining: 1.89s
790:	learn: 0.4049494	total: 7.12s	remaining: 1.88s
791:	learn: 0.4048343	total: 7.13s	remaining: 1.87s
792:	learn: 0.4045788	total: 7.14s	remaining: 1.86s
793:	learn: 0.4043324	total: 7.15s	remaining: 1.85s
794:	learn: 0.4039726	total: 7.16s	remaining: 1.84s
795:	learn: 0.4036423	total: 7.16s	remaining: 1.84s
796:	learn: 0.4032558	total: 7.17s	remaining: 1.83s
797:	learn: 0.4029862	total: 7.18s	remaining: 1.82s
798:	learn: 0.4027647	total: 7.19s	remaining: 1.81s
799:	learn: 0.4024700	total: 7.2s	remaining: 1.8s
800:	learn: 0.4021938	total: 7.21s	remaining: 1.79s
801:	learn: 0.4020185	total: 7.21s	remaining: 1.78s
802:	learn: 0.4017005	total: 7.22s	remaining: 1.77s
803:	learn: 0.4015810	total: 7.23s	remaining: 1.76s
804:	learn: 0.4013468	total: 7.24s	remaining: 1.75s
805:	learn: 0.4010200	total: 7.25s	remaining: 1.75s
806:	learn: 0.4009991	total: 7.26s	remaining: 1.74s
807:	learn: 0.4009826	total: 7.26s	remaining: 1.73s
808:	learn: 0.4007919	total: 7.27s	remaining: 1.72s
809:	learn: 0.4005360	total: 7.28s	remaining: 1.71s
810:	learn: 0.4003563	total: 7.29s	remaining: 1.7s
811:	learn: 0.4000402	total: 7.3s	remaining: 1.69s
812:	learn: 0.3998983	total: 7.31s	remaining: 1.68s
813:	learn: 0.3998088	total: 7.32s	remaining: 1.67s
814:	learn: 0.3994047	total: 7.33s	remaining: 1.66s
815:	learn: 0.3993247	total: 7.34s	remaining: 1.66s
816:	learn: 0.3991212	total: 7.35s	remaining: 1.65s
817:	learn: 0.3988378	total: 7.36s	remaining: 1.64s
818:	learn: 0.3986816	total: 7.37s	remaining: 1.63s
819:	learn: 0.3984393	total: 7.38s	remaining: 1.62s
820:	learn: 0.3980991	total: 7.38s	remaining: 1.61s
821:	learn: 0.3980435	total: 7.39s	remaining: 1.6s
822:	learn: 0.3978820	total: 7.4s	remaining: 1.59s
823:	learn: 0.3977240	total: 7.41s	remaining: 1.58s
824:	learn: 0.3976546	total: 7.42s	remaining: 1.57s
825:	learn: 0.3975777	total: 7.43s	remaining: 1.56s
826:	learn: 0.3975505	total: 7.44s	remaining: 1.56s
827:	learn: 0.3971492	total: 7.45s	remaining: 1.55s
828:	learn: 0.3970826	total: 7.46s	remaining: 1.54s
829:	learn: 0.3969634	total: 7.46s	remaining: 1.53s
830:	learn: 0.3968003	total: 7.47s	remaining: 1.52s
831:	learn: 0.3965743	total: 7.48s	remaining: 1.51s
832:	learn: 0.3961126	total: 7.49s	remaining: 1.5s
833:	learn: 0.3958562	total: 7.5s	remaining: 1.49s
834:	learn: 0.3957910	total: 7.51s	remaining: 1.48s
835:	learn: 0.3957808	total: 7.51s	remaining: 1.47s
836:	learn: 0.3956465	total: 7.52s	remaining: 1.47s
837:	learn: 0.3954929	total: 7.54s	remaining: 1.46s
838:	learn: 0.3951938	total: 7.54s	remaining: 1.45s
839:	learn: 0.3947798	total: 7.55s	remaining: 1.44s
840:	learn: 0.3946266	total: 7.56s	remaining: 1.43s
841:	learn: 0.3942274	total: 7.57s	remaining: 1.42s
842:	learn: 0.3940865	total: 7.58s	remaining: 1.41s
843:	learn: 0.3938476	total: 7.59s	remaining: 1.4s
844:	learn: 0.3936656	total: 7.6s	remaining: 1.39s
845:	learn: 0.3935358	total: 7.61s	remaining: 1.38s
846:	learn: 0.3932306	total: 7.61s	remaining: 1.38s
847:	learn: 0.3930780	total: 7.62s	remaining: 1.37s
848:	learn: 0.3930119	total: 7.63s	remaining: 1.36s
849:	learn: 0.3928333	total: 7.64s	remaining: 1.35s
850:	learn: 0.3925365	total: 7.65s	remaining: 1.34s
851:	learn: 0.3925268	total: 7.66s	remaining: 1.33s
852:	learn: 0.3921499	total: 7.66s	remaining: 1.32s
853:	learn: 0.3921204	total: 7.67s	remaining: 1.31s
854:	learn: 0.3918257	total: 7.68s	remaining: 1.3s
855:	learn: 0.3914995	total: 7.69s	remaining: 1.29s
856:	learn: 0.3913293	total: 7.7s	remaining: 1.28s
857:	learn: 0.3911320	total: 7.71s	remaining: 1.27s
858:	learn: 0.3910748	total: 7.71s	remaining: 1.27s
859:	learn: 0.3910252	total: 7.72s	remaining: 1.26s
860:	learn: 0.3908528	total: 7.73s	remaining: 1.25s
861:	learn: 0.3906230	total: 7.74s	remaining: 1.24s
862:	learn: 0.3904500	total: 7.75s	remaining: 1.23s
863:	learn: 0.3900264	total: 7.76s	remaining: 1.22s
864:	learn: 0.3897596	total: 7.77s	remaining: 1.21s
865:	learn: 0.3897447	total: 7.78s	remaining: 1.2s
866:	learn: 0.3893737	total: 7.79s	remaining: 1.19s
867:	learn: 0.3891295	total: 7.79s	remaining: 1.19s
868:	learn: 0.3886800	total: 7.8s	remaining: 1.18s
869:	learn: 0.3883654	total: 7.81s	remaining: 1.17s
870:	learn: 0.3882925	total: 7.82s	remaining: 1.16s
871:	learn: 0.3882796	total: 7.83s	remaining: 1.15s
872:	learn: 0.3880443	total: 7.83s	remaining: 1.14s
873:	learn: 0.3877714	total: 7.84s	remaining: 1.13s
874:	learn: 0.3873654	total: 7.85s	remaining: 1.12s
875:	learn: 0.3870112	total: 7.86s	remaining: 1.11s
876:	learn: 0.3868089	total: 7.87s	remaining: 1.1s
877:	learn: 0.3867992	total: 7.88s	remaining: 1.09s
878:	learn: 0.3862621	total: 7.89s	remaining: 1.08s
879:	learn: 0.3861514	total: 7.9s	remaining: 1.08s
880:	learn: 0.3859329	total: 7.91s	remaining: 1.07s
881:	learn: 0.3854951	total: 7.92s	remaining: 1.06s
882:	learn: 0.3850635	total: 7.92s	remaining: 1.05s
883:	learn: 0.3847001	total: 7.93s	remaining: 1.04s
884:	learn: 0.3844899	total: 7.94s	remaining: 1.03s
885:	learn: 0.3843185	total: 7.95s	remaining: 1.02s
886:	learn: 0.3839834	total: 7.96s	remaining: 1.01s
887:	learn: 0.3837849	total: 7.97s	remaining: 1s
888:	learn: 0.3836384	total: 7.98s	remaining: 996ms
889:	learn: 0.3832564	total: 7.99s	remaining: 987ms
890:	learn: 0.3829798	total: 8s	remaining: 978ms
891:	learn: 0.3826484	total: 8.01s	remaining: 970ms
892:	learn: 0.3825027	total: 8.02s	remaining: 961ms
893:	learn: 0.3823766	total: 8.03s	remaining: 952ms
894:	learn: 0.3822161	total: 8.04s	remaining: 943ms
895:	learn: 0.3818894	total: 8.04s	remaining: 934ms
896:	learn: 0.3818360	total: 8.05s	remaining: 925ms
897:	learn: 0.3817288	total: 8.06s	remaining: 916ms
898:	learn: 0.3815310	total: 8.07s	remaining: 907ms
899:	learn: 0.3812956	total: 8.08s	remaining: 898ms
900:	learn: 0.3810459	total: 8.09s	remaining: 889ms
901:	learn: 0.3808124	total: 8.1s	remaining: 880ms
902:	learn: 0.3805593	total: 8.1s	remaining: 871ms
903:	learn: 0.3804881	total: 8.11s	remaining: 861ms
904:	learn: 0.3801978	total: 8.12s	remaining: 852ms
905:	learn: 0.3800990	total: 8.13s	remaining: 843ms
906:	learn: 0.3799667	total: 8.14s	remaining: 835ms
907:	learn: 0.3799322	total: 8.15s	remaining: 826ms
908:	learn: 0.3795900	total: 8.16s	remaining: 817ms
909:	learn: 0.3793654	total: 8.16s	remaining: 808ms
910:	learn: 0.3789361	total: 8.17s	remaining: 799ms
911:	learn: 0.3787479	total: 8.18s	remaining: 790ms
912:	learn: 0.3786813	total: 8.19s	remaining: 781ms
913:	learn: 0.3785789	total: 8.2s	remaining: 772ms
914:	learn: 0.3784972	total: 8.21s	remaining: 763ms
915:	learn: 0.3781728	total: 8.22s	remaining: 754ms
916:	learn: 0.3781668	total: 8.23s	remaining: 745ms
917:	learn: 0.3778885	total: 8.24s	remaining: 736ms
918:	learn: 0.3778138	total: 8.24s	remaining: 727ms
919:	learn: 0.3773606	total: 8.25s	remaining: 718ms
920:	learn: 0.3771310	total: 8.26s	remaining: 709ms
921:	learn: 0.3767315	total: 8.27s	remaining: 700ms
922:	learn: 0.3763909	total: 8.28s	remaining: 691ms
923:	learn: 0.3762752	total: 8.29s	remaining: 682ms
924:	learn: 0.3758561	total: 8.3s	remaining: 673ms
925:	learn: 0.3754948	total: 8.31s	remaining: 664ms
926:	learn: 0.3751872	total: 8.32s	remaining: 655ms
927:	learn: 0.3750880	total: 8.32s	remaining: 646ms
928:	learn: 0.3747080	total: 8.34s	remaining: 637ms
929:	learn: 0.3745016	total: 8.35s	remaining: 628ms
930:	learn: 0.3743161	total: 8.36s	remaining: 619ms
931:	learn: 0.3739932	total: 8.36s	remaining: 610ms
932:	learn: 0.3738232	total: 8.37s	remaining: 601ms
933:	learn: 0.3735402	total: 8.38s	remaining: 592ms
934:	learn: 0.3734337	total: 8.39s	remaining: 583ms
935:	learn: 0.3731022	total: 8.4s	remaining: 574ms
936:	learn: 0.3729378	total: 8.4s	remaining: 565ms
937:	learn: 0.3728856	total: 8.41s	remaining: 556ms
938:	learn: 0.3726087	total: 8.42s	remaining: 547ms
939:	learn: 0.3724246	total: 8.43s	remaining: 538ms
940:	learn: 0.3722745	total: 8.44s	remaining: 529ms
941:	learn: 0.3720113	total: 8.45s	remaining: 520ms
942:	learn: 0.3717664	total: 8.46s	remaining: 511ms
943:	learn: 0.3715789	total: 8.46s	remaining: 502ms
944:	learn: 0.3714875	total: 8.47s	remaining: 493ms
945:	learn: 0.3714079	total: 8.48s	remaining: 484ms
946:	learn: 0.3713869	total: 8.49s	remaining: 475ms
947:	learn: 0.3711169	total: 8.5s	remaining: 466ms
948:	learn: 0.3706499	total: 8.51s	remaining: 457ms
949:	learn: 0.3704797	total: 8.52s	remaining: 448ms
950:	learn: 0.3701195	total: 8.52s	remaining: 439ms
951:	learn: 0.3698460	total: 8.53s	remaining: 430ms
952:	learn: 0.3694677	total: 8.54s	remaining: 421ms
953:	learn: 0.3693424	total: 8.55s	remaining: 412ms
954:	learn: 0.3690559	total: 8.56s	remaining: 403ms
955:	learn: 0.3689731	total: 8.57s	remaining: 394ms
956:	learn: 0.3684973	total: 8.58s	remaining: 385ms
957:	learn: 0.3683536	total: 8.59s	remaining: 376ms
958:	learn: 0.3680819	total: 8.6s	remaining: 368ms
959:	learn: 0.3678506	total: 8.6s	remaining: 359ms
960:	learn: 0.3676368	total: 8.61s	remaining: 350ms
961:	learn: 0.3673916	total: 8.62s	remaining: 341ms
962:	learn: 0.3671409	total: 8.63s	remaining: 332ms
963:	learn: 0.3669977	total: 8.64s	remaining: 323ms
964:	learn: 0.3667138	total: 8.65s	remaining: 314ms
965:	learn: 0.3664934	total: 8.65s	remaining: 305ms
966:	learn: 0.3662981	total: 8.66s	remaining: 296ms
967:	learn: 0.3662385	total: 8.67s	remaining: 287ms
968:	learn: 0.3661615	total: 8.68s	remaining: 278ms
969:	learn: 0.3659941	total: 8.69s	remaining: 269ms
970:	learn: 0.3657926	total: 8.7s	remaining: 260ms
971:	learn: 0.3656347	total: 8.71s	remaining: 251ms
972:	learn: 0.3653552	total: 8.71s	remaining: 242ms
973:	learn: 0.3652447	total: 8.72s	remaining: 233ms
974:	learn: 0.3652280	total: 8.73s	remaining: 224ms
975:	learn: 0.3650762	total: 8.74s	remaining: 215ms
976:	learn: 0.3647532	total: 8.75s	remaining: 206ms
977:	learn: 0.3642729	total: 8.76s	remaining: 197ms
978:	learn: 0.3640730	total: 8.77s	remaining: 188ms
979:	learn: 0.3638375	total: 8.78s	remaining: 179ms
980:	learn: 0.3636624	total: 8.79s	remaining: 170ms
981:	learn: 0.3635529	total: 8.8s	remaining: 161ms
982:	learn: 0.3635032	total: 8.81s	remaining: 152ms
983:	learn: 0.3632392	total: 8.82s	remaining: 143ms
984:	learn: 0.3631311	total: 8.83s	remaining: 134ms
985:	learn: 0.3630180	total: 8.84s	remaining: 125ms
986:	learn: 0.3626922	total: 8.85s	remaining: 117ms
987:	learn: 0.3624298	total: 8.86s	remaining: 108ms
988:	learn: 0.3623686	total: 8.87s	remaining: 98.6ms
989:	learn: 0.3622097	total: 8.88s	remaining: 89.7ms
990:	learn: 0.3621498	total: 8.88s	remaining: 80.7ms
991:	learn: 0.3620116	total: 8.89s	remaining: 71.7ms
992:	learn: 0.3617796	total: 8.9s	remaining: 62.8ms
993:	learn: 0.3616519	total: 8.91s	remaining: 53.8ms
994:	learn: 0.3615199	total: 8.92s	remaining: 44.8ms
995:	learn: 0.3612571	total: 8.93s	remaining: 35.9ms
996:	learn: 0.3609772	total: 8.94s	remaining: 26.9ms
997:	learn: 0.3606387	total: 8.95s	remaining: 17.9ms
998:	learn: 0.3604249	total: 8.96s	remaining: 8.97ms
999:	learn: 0.3603839	total: 8.97s	remaining: 0us
In [125]:
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)
In [126]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

Prediction is good in CatBoost Regressor.¶

In [127]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

Not overfitted on Training set. So, this model is considered for final evaluation.¶

In [128]:
recorddata(modelname, pred_y_test, y_test, zero_to_one, one_to_two, two_to_three, greater_than_three, trn_time, pred_time)
model_name, r2_score_, mae,rmse,mse,error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, trn_time, pred_time
Out[128]:
(['MultipleLinearRegression',
  'RidgeRegression',
  'RidgeRegression Tuned',
  'LassoRegression Tuned',
  'RandomForestRegressor',
  'RandomForestRegressor Tuned',
  'SupportVectorRegressor',
  'KNeighborsRegressor',
  'GradientBoosting Regressor',
  'CatBoostRegressor'],
 [0.5453714270668548,
  0.5445069168183965,
  0.5453700403294548,
  0.4125003185488286,
  0.5962632916868726,
  0.5523520343191741,
  0.5657549065097839,
  0.5133423844796292,
  0.6292510454065448,
  0.6023629634677131],
 [0.7425829252963528,
  0.7431269740195243,
  0.742606843045171,
  0.8318949203719893,
  0.6757346611191009,
  0.7300144780474158,
  0.6912996633539209,
  0.7381497252747253,
  0.6832170421355467,
  0.6726558143661362],
 [1.0377868868311382,
  1.0387731329068615,
  1.0377884695921848,
  1.1797327420693962,
  0.9779776596065038,
  1.0297887019550571,
  1.0142552752973413,
  1.0737213212086887,
  0.9371730316626096,
  0.9705618789764491],
 [1.0770016224786658,
  1.0790496216491359,
  1.077004907618489,
  1.3917693427105766,
  0.9564403026894147,
  1.0604647706742814,
  1.0287137634684858,
  1.152877475618132,
  0.8782932912756866,
  0.9419903609222954],
 [144, 144, 144, 134, 148, 145, 149, 144, 145, 145],
 [24, 24, 24, 29, 21, 25, 19, 23, 29, 24],
 [9, 9, 9, 11, 11, 7, 9, 9, 5, 11],
 [5, 5, 5, 8, 2, 5, 5, 6, 3, 2],
 10.028023481369019,
 0.01307821273803711)

DecisionTreeRegressor¶

In [129]:
from sklearn.tree import DecisionTreeRegressor
In [130]:
modelname = 'DecisionTreeRegressor'
pred_y_test, pred_y_train, trn_time, pred_time = model_trainer(DecisionTreeRegressor(), x_train, y_train, x_test)
error, zero_to_one, one_to_two, two_to_three, greater_than_three = calculate_error_range(pred_y_test, y_test)
visualize_error(error, modelname)

From the above scatter plot we know that the model is quite bad than other models.¶

In [131]:
visulaize_performance_of_the_model(pred_y_test, y_test, modelname)

Prediction is not good in Decision Tree Regressor.¶

In [132]:
visualize_prediction_on_traindata(pred_y_train, y_train, modelname)

Decision Tree Regressor is highly overfitted on training set. So This model is not considered.¶

In [133]:
record = pd.DataFrame()
In [134]:
record['Model Name'], record['r2 Score'], record['Mean Absolute Error'], record['Mean Squared Error'], record['Root Mean Squared Error'], record['Error : 0 to 1'], record['Error : 1 to 2'], record['Error : 2 to 3'], record['Error : >3'], record['Training Time(Seconds)'], record['Prediction Time(Seconds)'] = model_name, r2_score_, mae, mse, rmse, error_0_to_1, error_1_to_2, error_2_to_3, error_greater_than_3, training_time, prediction_time
In [135]:
record
Out[135]:
Model Name r2 Score Mean Absolute Error Mean Squared Error Root Mean Squared Error Error : 0 to 1 Error : 1 to 2 Error : 2 to 3 Error : >3 Training Time(Seconds) Prediction Time(Seconds)
0 MultipleLinearRegression 0.545371 0.742583 1.077002 1.037787 144 24 9 5 0.009304 0.000998
1 RidgeRegression 0.544507 0.743127 1.079050 1.038773 144 24 9 5 0.002551 0.001225
2 RidgeRegression Tuned 0.545370 0.742607 1.077005 1.037788 144 24 9 5 0.003022 0.001007
3 LassoRegression Tuned 0.412500 0.831895 1.391769 1.179733 134 29 11 8 0.001503 0.001075
4 RandomForestRegressor 0.596263 0.675735 0.956440 0.977978 148 21 11 2 0.176935 0.009002
5 RandomForestRegressor Tuned 0.552352 0.730014 1.060465 1.029789 145 25 7 5 0.121486 0.007001
6 SupportVectorRegressor 0.565755 0.691300 1.028714 1.014255 149 19 9 5 0.023963 0.008002
7 KNeighborsRegressor 0.513342 0.738150 1.152877 1.073721 144 23 9 6 0.001238 0.002003
8 GradientBoosting Regressor 0.629251 0.683217 0.878293 0.937173 145 29 5 3 0.668665 0.011004
9 CatBoostRegressor 0.602363 0.672656 0.941990 0.970562 145 24 11 2 10.028023 0.013078
In [136]:
record.to_csv('EvaluationRecord.csv')
In [137]:
fig = go.Figure()
fig.add_trace(go.Scatter(y=record['r2 Score'],x=record['Model Name'],
                    mode='markers',
                    name='r2_score'))
fig.add_trace(go.Scatter(y=record['Mean Absolute Error'],x=record['Model Name'],
                    mode='markers',
                    name='Mean Absolute Error'))
fig.add_trace(go.Scatter(y=record['Mean Squared Error'],x=record['Model Name'],
                    mode='markers',
                    name='Mean Squared Error'))
fig.add_trace(go.Scatter(y=record['Root Mean Squared Error'],x=record['Model Name'],
                    mode='markers',
                    name='Root Mean Squared Error'))
fig.update_layout(
    title="Evaluation of the Model based on Scoring Metrics",
    xaxis_title="Model",
    yaxis_title="Score",
    legend_title_text="Scoring Metrics",
    font=dict(
        family="Courier New, monospace",
        size=13,
        color="RebeccaPurple"
    )
)
fig.show()

From the above plots. We have to choose the model which stays low in the graph.¶

From the above graph we pick stable models,¶

1) GradientBoosting Regressor,¶

2) CatBoost Regresssor,¶

3) RandomForest Regressor,¶

4) SupportVector Regressor ,¶

5) KNeighbors Regressor &¶

6) MultipleLinear Regression¶

In [138]:
stacked_barplot = pd.DataFrame()
In [139]:
model_name, error_range, count = [], [], []
In [140]:
for i in range(len(record)):
    model_name.append(record['Model Name'][i])
    model_name.append(record['Model Name'][i])
    model_name.append(record['Model Name'][i])
    model_name.append(record['Model Name'][i])
    error_range.append('Error : 0 to 1')
    error_range.append('Error : 1 to 2')
    error_range.append('Error : 2 to 3')
    error_range.append('Error : >3')
    count.append(record['Error : 0 to 1'][i])
    count.append(record['Error : 1 to 2'][i])
    count.append(record['Error : 2 to 3'][i])
    count.append(record['Error : >3'][i])
In [141]:
model_name, error_range, count
Out[141]:
(['MultipleLinearRegression',
  'MultipleLinearRegression',
  'MultipleLinearRegression',
  'MultipleLinearRegression',
  'RidgeRegression',
  'RidgeRegression',
  'RidgeRegression',
  'RidgeRegression',
  'RidgeRegression Tuned',
  'RidgeRegression Tuned',
  'RidgeRegression Tuned',
  'RidgeRegression Tuned',
  'LassoRegression Tuned',
  'LassoRegression Tuned',
  'LassoRegression Tuned',
  'LassoRegression Tuned',
  'RandomForestRegressor',
  'RandomForestRegressor',
  'RandomForestRegressor',
  'RandomForestRegressor',
  'RandomForestRegressor Tuned',
  'RandomForestRegressor Tuned',
  'RandomForestRegressor Tuned',
  'RandomForestRegressor Tuned',
  'SupportVectorRegressor',
  'SupportVectorRegressor',
  'SupportVectorRegressor',
  'SupportVectorRegressor',
  'KNeighborsRegressor',
  'KNeighborsRegressor',
  'KNeighborsRegressor',
  'KNeighborsRegressor',
  'GradientBoosting Regressor',
  'GradientBoosting Regressor',
  'GradientBoosting Regressor',
  'GradientBoosting Regressor',
  'CatBoostRegressor',
  'CatBoostRegressor',
  'CatBoostRegressor',
  'CatBoostRegressor'],
 ['Error : 0 to 1',
  'Error : 1 to 2',
  'Error : 2 to 3',
  'Error : >3',
  'Error : 0 to 1',
  'Error : 1 to 2',
  'Error : 2 to 3',
  'Error : >3',
  'Error : 0 to 1',
  'Error : 1 to 2',
  'Error : 2 to 3',
  'Error : >3',
  'Error : 0 to 1',
  'Error : 1 to 2',
  'Error : 2 to 3',
  'Error : >3',
  'Error : 0 to 1',
  'Error : 1 to 2',
  'Error : 2 to 3',
  'Error : >3',
  'Error : 0 to 1',
  'Error : 1 to 2',
  'Error : 2 to 3',
  'Error : >3',
  'Error : 0 to 1',
  'Error : 1 to 2',
  'Error : 2 to 3',
  'Error : >3',
  'Error : 0 to 1',
  'Error : 1 to 2',
  'Error : 2 to 3',
  'Error : >3',
  'Error : 0 to 1',
  'Error : 1 to 2',
  'Error : 2 to 3',
  'Error : >3',
  'Error : 0 to 1',
  'Error : 1 to 2',
  'Error : 2 to 3',
  'Error : >3'],
 [144,
  24,
  9,
  5,
  144,
  24,
  9,
  5,
  144,
  24,
  9,
  5,
  134,
  29,
  11,
  8,
  148,
  21,
  11,
  2,
  145,
  25,
  7,
  5,
  149,
  19,
  9,
  5,
  144,
  23,
  9,
  6,
  145,
  29,
  5,
  3,
  145,
  24,
  11,
  2])
In [142]:
stacked_barplot['model_name'], stacked_barplot['error_range'], stacked_barplot['count'] = model_name, error_range, count
In [143]:
stacked_barplot
Out[143]:
model_name error_range count
0 MultipleLinearRegression Error : 0 to 1 144
1 MultipleLinearRegression Error : 1 to 2 24
2 MultipleLinearRegression Error : 2 to 3 9
3 MultipleLinearRegression Error : >3 5
4 RidgeRegression Error : 0 to 1 144
5 RidgeRegression Error : 1 to 2 24
6 RidgeRegression Error : 2 to 3 9
7 RidgeRegression Error : >3 5
8 RidgeRegression Tuned Error : 0 to 1 144
9 RidgeRegression Tuned Error : 1 to 2 24
10 RidgeRegression Tuned Error : 2 to 3 9
11 RidgeRegression Tuned Error : >3 5
12 LassoRegression Tuned Error : 0 to 1 134
13 LassoRegression Tuned Error : 1 to 2 29
14 LassoRegression Tuned Error : 2 to 3 11
15 LassoRegression Tuned Error : >3 8
16 RandomForestRegressor Error : 0 to 1 148
17 RandomForestRegressor Error : 1 to 2 21
18 RandomForestRegressor Error : 2 to 3 11
19 RandomForestRegressor Error : >3 2
20 RandomForestRegressor Tuned Error : 0 to 1 145
21 RandomForestRegressor Tuned Error : 1 to 2 25
22 RandomForestRegressor Tuned Error : 2 to 3 7
23 RandomForestRegressor Tuned Error : >3 5
24 SupportVectorRegressor Error : 0 to 1 149
25 SupportVectorRegressor Error : 1 to 2 19
26 SupportVectorRegressor Error : 2 to 3 9
27 SupportVectorRegressor Error : >3 5
28 KNeighborsRegressor Error : 0 to 1 144
29 KNeighborsRegressor Error : 1 to 2 23
30 KNeighborsRegressor Error : 2 to 3 9
31 KNeighborsRegressor Error : >3 6
32 GradientBoosting Regressor Error : 0 to 1 145
33 GradientBoosting Regressor Error : 1 to 2 29
34 GradientBoosting Regressor Error : 2 to 3 5
35 GradientBoosting Regressor Error : >3 3
36 CatBoostRegressor Error : 0 to 1 145
37 CatBoostRegressor Error : 1 to 2 24
38 CatBoostRegressor Error : 2 to 3 11
39 CatBoostRegressor Error : >3 2
In [144]:
fig = plx.bar(stacked_barplot, x="model_name", y="count", color="error_range", height=1000)
fig.update_layout(
    title="Performance of the all models on Test Data",
    xaxis_title="Model",
    yaxis_title="Count of Test data",
    legend_title_text="Error Range",
    font=dict(
        family="Courier New, monospace",
        size=13,
        color="RebeccaPurple"
    )
)
fig.show()

GradientBoosting Regressor, CatBoost Regresssor,3) RandomForest Regressor,4) SupportVector Regressor ,5) KNeighbors Regressor & 6) MultipleLinear Regression are quite stable in different train and test set.¶

To be honest, we're picking one best model out of the 10 models based on only one set of train and test split. In my point of view this is not the right way of picking best model. Every algorithm learns in a different way.¶

We have to find the algorithm which learns this dataset better than all other algorithms. No Decision based on single train & test split.¶

Here, we're going to run some of the algorithms for 'n' times with 'n' different train & test split. Then, calculating r2 score for each split and stored it in a list for all considered algorithm.¶

Then visualizing it, we know real goodness of the algorithm, particulary for this dataset only...¶

In [145]:
score_gbr,score_cbr,score_rfr,score_svr,score_knr,score_mlr = [],[],[],[],[],[]
for i in tqdm(range(1500)):
    x_train, x_test, y_train, y_test = train_test_split(data.drop(['LC50'], axis = 1), data['LC50'], test_size= 0.2)
    gbr = GradientBoostingRegressor(verbose=0).fit(x_train, y_train)
    cbr = CatBoostRegressor(verbose=0).fit(x_train, y_train)
    rfr = RandomForestRegressor(verbose=0).fit(x_train, y_train)
    svr = SVR().fit(x_train, y_train)
    knr = KNeighborsRegressor().fit(x_train, y_train)
    mlr = LinearRegression().fit(x_train, y_train)
    
    pred_gbr = gbr.predict(x_test)
    score_gbr.append(r2_score(y_test, pred_gbr))
    
    pred_cbr = cbr.predict(x_test)
    score_cbr.append(r2_score(y_test, pred_cbr))
    
    pred_rfr = rfr.predict(x_test)
    score_rfr.append(r2_score(y_test, pred_rfr))
    
    pred_svr = svr.predict(x_test)
    score_svr.append(r2_score(y_test, pred_svr))
    
    pred_knr = knr.predict(x_test)
    score_knr.append(r2_score(y_test, pred_knr))
    
    pred_mlr = mlr.predict(x_test)
    score_mlr.append(r2_score(y_test, pred_mlr))
100%|██████████| 1500/1500 [33:09<00:00,  1.33s/it] 
In [146]:
plx.scatter(score_gbr)
In [147]:
plx.scatter(score_cbr)
In [148]:
plx.scatter(score_rfr)
In [149]:
plx.scatter(score_svr)
In [150]:
plx.scatter(score_knr)
In [151]:
plx.scatter(score_mlr)
In [152]:
hist_data = [score_gbr, score_cbr, score_rfr, score_svr, score_knr, score_mlr]
In [153]:
group_labels = ['GradientBoost Regressor',
                'CatBoostRegressor',
                'RandomForest Regressor',
                'SupportVector Regressor',
                'KNeighbors Regressor',
                'MultipleLinear Regression']
In [154]:
import plotly.figure_factory as ff
In [155]:
fig = ff.create_distplot(hist_data, group_labels=group_labels, show_hist=False, show_rug=False)
fig.show()

The above plot is quite messy. So, we're going to compare in set of groups. Here, we clearly obsereved that Brown and purple(MultipleLinear Regression and KNeighbors Regressor) colored line are in same range. So, we take that and compare.¶

In [156]:
fig = ff.create_distplot(hist_data, group_labels=group_labels, show_hist=False, show_rug=False)
fig.show()

From the above normal distribution, we're comparing two models which i.e., MultipleLinear Regression and KNeighbors Regressor. The KNeighbors stays bit lower than MultipleLinear. This says that r2 score of MultipleLinear Regression is much stable than KNeighbors Regressor.¶

So, we can remove KNeighbors from our Evaluation.¶

In [157]:
hist_data = [score_gbr, score_cbr, score_rfr, score_svr, score_mlr]
group_labels = ['GradientBoost Regressor',
                'CatBoostRegressor',
                'RandomForest Regressor',
                'SupportVector Regressor',
                'MultipleLinear Regression']

Now, we're having 5 models to evaluate.¶

In [158]:
fig = ff.create_distplot(hist_data, group_labels=group_labels, show_hist=False, show_rug=False)
fig.show()

From the plot, we see that the (purple plot) Multiple Linear Regression stays left in the graph, which means that r2 score of Multiple Linear Regression is lesser than other models. So, we're going to remove MultipleLinear Regression from our evaluation.¶

In [159]:
hist_data = [score_gbr, score_cbr, score_rfr, score_svr]
group_labels = ['GradientBoost Regressor',
                'CatBoostRegressor',
                'RandomForest Regressor',
                'SupportVector Regressor']
In [160]:
fig = ff.create_distplot(hist_data, group_labels=group_labels, show_hist=False, show_rug=False)
fig.show()

From the above graph we can see the two group of plots. The orange plot is totally away from the group rest of them are of same group. First, we look at that group.¶

The orange plot is nothing but the CatBoost Regressor. Now, we're removing that model temporarily for evaluation.¶

In [161]:
hist_data = [score_gbr, score_rfr, score_svr]
group_labels = ['GradientBoost Regressor',
                'RandomForest Regressor',
                'SupportVector Regressor']
In [162]:
fig = ff.create_distplot(hist_data, group_labels=group_labels, show_hist=False, show_rug=False)
fig.show()

The curve of RandomForest Regressor is stable and good. Rest of them stays lower and unstable than the RandomForest Regressor(green plot).¶

So, we're going to remove SupportVector and GradientBoosting from our evaluation.¶

In [166]:
hist_data = [score_cbr, score_rfr]
group_labels = [
                'CatBoostRegressor',
                'RandomForest Regressor']
In [167]:
fig = ff.create_distplot(hist_data, group_labels=group_labels, show_hist=False, show_rug=False)
fig.show()

Now, we're having only two models to decide the best.¶

Here, The Distribution of RandomForest Regressor is better than CatBoost Regressor. And, the curve of CatBoost Regressor is not unstable.¶

There is a huge difference between two models when comes to r2 score.¶

The CatBoost Regressor stays right to the RandomForest Regressor. It means that the performance of CatBoost Regressor is better than RandomForest Regressor. Not only RandomForest, the CatBoost Regressor is better than all model. Once again we can see the plot of all models.¶

In [169]:
hist_data = [score_gbr, score_cbr, score_rfr, score_svr, score_knr, score_mlr]
group_labels = ['GradientBoost Regressor',
                'CatBoostRegressor',
                'RandomForest Regressor',
                'SupportVector Regressor',
                'KNeighbors Regressor',
                'MultipleLinear Regression']
fig = ff.create_distplot(hist_data, group_labels=group_labels, show_hist=False, show_rug=False)
fig.show()

From the above graph we can see that the orange line stays right hand side of all other plots. This means that CatBoost Regressor is a better choice for this dataset.¶